Expand description
A crate that safely exposes arch intrinsics via #[cfg()].
safe_arch lets you safely use CPU intrinsics. Those things in the
core::arch modules. It works purely via #[cfg()] and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are “newtype’d” so that better trait impls can be given
to them, but the inner value is a pub field so feel free to just grab it
out if you need to. Trait impls of the newtypes include: Default (zeroed),
From/Into of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_archfunction takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuckfeature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.
Naming Conventions
The safe_arch crate does not simply use the “official” names for each
intrinsic, because the official names are generally poor. Instead, the
operations have been given better names that makes things hopefully easier
to understand then you’re reading the code.
For a full explanation of the naming used, see the Naming Conventions page.
Current Support
x86/x86_64(Intel, AMD, etc)- 128-bit:
sse,sse2,sse3,ssse3,sse4.1,sse4.2 - 256-bit:
avx,avx2 - Other:
adx,aes,bmi1,bmi2,fma,lzcnt,pclmulqdq,popcnt,rdrand,rdseed
- 128-bit:
Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse and sse2 CPU
features by default for all i686 (x86) and x86_64 builds. Those CPU
features are built into the design of x86_64, and you’d need a super old
x86 CPU for it to not support at least sse and sse2, so they’re a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you’ll have to
enable them with an additional arg to rustc. For a feature named name
you pass -C target-feature=+name, such as -C target-feature=+sse3 for
sse3.
You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you’re building a program
you’ll only run on your own system.
It’s sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click “Other Settings” it’ll expand into a list of CPU target features and
how common they are. These days, it seems that sse3 can be safely assumed,
and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff
above 128-bit isn’t as common yet, give it another few years.
Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn’t actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the “feature_detected” family of macros, but if you
enable a feature they will evaluate to a constant true instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don’t hold,
you can’t use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
A Note On Working With Cfg
There’s two main ways to use cfg:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it’s actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
trueorfalseand the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg via the attribute, so the functions it exposes don’t
exist at all when the appropriate CPU target features aren’t enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you’ll also need to control your use of
this crate via cfg attribute, not cfg macro.
Modules
- An explanation of the crate’s naming conventions.
Macros
- ?
- Blends the
i32lanes in$aand$binto a single value. - Blends the
i16lanes according to the immediate mask. - Blends the
i16lanes according to the immediate value. - Blends the
i32lanes according to the immediate value. - Blends the lanes according to the immediate mask.
- Blends the lanes according to the immediate mask.
- Blends the
f32lanes according to the immediate mask. - Blends the
f64lanes according to the immediate mask. - Shifts all bits in the entire register left by a number of bytes.
- Shifts each
u128lane left by a number of bytes. - Shifts all bits in the entire register right by a number of bytes.
- Shifts each
u128lane right by a number of bytes. - Compare
f32lanes according to the operation specified, mask output. - Compare
f32lanes according to the operation specified, mask output. - Compare
f64lanes according to the operation specified, mask output. - Compare
f64lanes according to the operation specified, mask output. - Compare
f32lanes according to the operation specified, mask output. - Compare
f64lanes according to the operation specified, mask output. - Counts
$aas the high bytes and$bas the low bytes then performs a byte shift to the right by the immediate value. - Works like
combined_byte_shr_imm_m128i, but twice as wide. - Turns a comparison operator token to the correct constant value.
- Performs a dot product of two
m128registers. - Performs a dot product of two
m128dregisters. - This works like
dot_product_m128, but twice as wide. - Gets the
f32lane requested. Returns as ani32bit pattern. - Gets the
i8lane requested. Only the lowest 4 bits are considered. - Gets an
i8value out of anm256i, returns asi32. - Gets an
i16value out of anm128i, returns asi32. - Gets an
i16value out of anm256i, returns asi32. - Extracts an
i32lane fromm256i - Gets the
i32lane requested. Only the lowest 2 bits are considered. - Extracts an
i64lane fromm256i - Gets the
i64lane requested. Only the lowest bit is considered. - Extracts an
m128fromm256 - Extracts an
m128dfromm256d - Extracts an
m128ifromm256i - Gets an
m128ivalue out of anm256i. - Inserts a lane from
$binto$a, optionally at a new position. - Inserts a new value for the
i64lane specified. - Inserts an
i8tom256i - Inserts the low 16 bits of an
i32value into anm128i. - Inserts an
i16tom256i - Inserts a new value for the
i32lane specified. - Inserts an
i32tom256i - Inserts a new value for the
i64lane specified. - Inserts an
i64tom256i - Inserts an
m128tom256 - Inserts an
m128dtom256d - Inserts an
m128ito anm256iat the high or low position. - Slowly inserts an
m128itom256i. - Performs a “carryless” multiplication of two
i64values. - Computes eight
u16“sum of absolute difference” values according to the bytes selected. - Computes eight
u16“sum of absolute difference” values according to the bytes selected. - Rounds each lane in the style specified.
- Rounds
$blow as specified, other lanes use$a. - Rounds each lane in the style specified.
- Rounds
$blow as specified, keeps$ahigh. - Rounds each lane in the style specified.
- Rounds each lane in the style specified.
- Shifts all
u16lanes left by an immediate. - Shifts all
u16lanes left by an immediate. - Shifts all
u32lanes left by an immediate. - Shifts all
u32lanes left by an immediate. - Shifts both
u64lanes left by an immediate. - Shifts all
u64lanes left by an immediate. - Shifts all
i16lanes right by an immediate. - Shifts all
i16lanes left by an immediate. - Shifts all
i32lanes right by an immediate. - Shifts all
i32lanes left by an immediate. - Shifts all
u16lanes right by an immediate. - Shifts all
u16lanes right by an immediate. - Shifts all
u32lanes right by an immediate. - Shifts all
u32lanes right by an immediate. - Shifts both
u64lanes right by an immediate. - Shifts all
u64lanes right by an immediate. - Shuffle the
f32lanes from$aand$btogether using an immediate control value. - Shuffle the
f32lanes from$aand$btogether using an immediate control value. - Shuffle the
f64lanes from$aand$btogether using an immediate control value. - Shuffle the
f64lanes from$aand$btogether using an immediate control value. - Shuffle 128 bits of floating point data at a time from
$aand$busing an immediate control value. - Shuffle 128 bits of floating point data at a time from
$aand$busing an immediate control value. - Slowly swizzle 128 bits of integer data from
$aand$busing an immediate control value. - Shuffle 128 bits of integer data from
$aand$busing an immediate control value. - Shuffle the
f32lanes from$ausing an immediate control value. - Shuffle the
i32lanes in$ausing an immediate control value. - Shuffle the
f32lanes in$ausing an immediate control value. - Shuffle the
f64lanes in$ausing an immediate control value. - Shuffle the
f64lanes from$ausing an immediate control value. - Shuffle the
f64lanes from$aand$btogether using an immediate control value. - Shuffle the high
i16lanes in$ausing an immediate control value. - Shuffle the high
i16lanes in$ausing an immediate control value. - Shuffle the low
i16lanes in$ausing an immediate control value. - Shuffle the low
i16lanes in$ausing an immediate control value. - Shuffle the
i32lanes in$ausing an immediate control value. - Shuffle the
f64lanes in$ausing an immediate control value. - Looks for
$needlein$haystackand gives the index of the either the first or last match. - Looks for
$needlein$haystackand gives the mask of where the matches were.
Structs
- The data for a 128-bit SSE register of four
f32lanes. - The data for a 128-bit SSE register of two
f64values. - The data for a 128-bit SSE register of integer data.
- The data for a 256-bit AVX register of eight
f32lanes. - The data for a 256-bit AVX register of four
f64values. - The data for a 256-bit AVX register of integer data.
Functions
- Lanewise absolute value with lanes as
i8. - Absolute value of
i8lanes. - Lanewise absolute value with lanes as
i16. - Absolute value of
i16lanes. - Lanewise absolute value with lanes as
i32. - Absolute value of
i32lanes. - Add two
u32with a carry value. - Add two
u64with a carry value. - Add horizontal pairs of
i16values, pack the outputs asathenb. - Horizontal
a + bwith lanes asi16. - Add horizontal pairs of
i32values, pack the outputs asathenb. - Horizontal
a + bwith lanes asi32. - Add each lane horizontally, pack the outputs as
athenb. - Add each lane horizontally, pack the outputs as
athenb. - Add adjacent
f32lanes. - Add adjacent
f64lanes. - Add horizontal pairs of
i16values, saturating, pack the outputs asathenb. - Horizontal saturating
a + bwith lanes asi16. - Lanewise
a + bwith lanes asi8. - Lanewise
a + bwith lanes asi8. - Lanewise
a + bwith lanes asi16. - Lanewise
a + bwith lanes asi16. - Lanewise
a + bwith lanes asi32. - Lanewise
a + bwith lanes asi32. - Lanewise
a + bwith lanes asi64. - Lanewise
a + bwith lanes asi64. - Lanewise
a + b. - Low lane
a + b, other lanes unchanged. - Lanewise
a + b. - Lowest lane
a + b, high lane unchanged. - Lanewise
a + bwithf32lanes. - Lanewise
a + bwithf64lanes. - Lanewise saturating
a + bwith lanes asi8. - Lanewise saturating
a + bwith lanes asi8. - Lanewise saturating
a + bwith lanes asi16. - Lanewise saturating
a + bwith lanes asi16. - Lanewise saturating
a + bwith lanes asu8. - Lanewise saturating
a + bwith lanes asu8. - Lanewise saturating
a + bwith lanes asu16. - Lanewise saturating
a + bwith lanes asu16. - Alternately, from the top, add a lane and then subtract a lane.
- Add the high lane and subtract the low lane.
- Alternately, from the top, add
f32then subf32. - Alternately, from the top, add
f64then subf64. - “Perform the last round of AES decryption flow on
ausing theround_key.” - “Perform one round of AES decryption flow on
ausing theround_key.” - “Perform the last round of AES encryption flow on
ausing theround_key.” - “Perform one round of AES encryption flow on
ausing theround_key.” - “Perform the InvMixColumns transform on
a.” - Lanewise average of the
u8values. - Average
u8lanes. - Lanewise average of the
u16values. - Average
u16lanes. - Extract a span of bits from the
u32, control value style. - Extract a span of bits from the
u64, control value style. - Extract a span of bits from the
u32, start and len style. - Extract a span of bits from the
u64, start and len style. - Gets the mask of all bits up to and including the lowest set bit in a
u32. - Gets the mask of all bits up to and including the lowest set bit in a
u64. - Resets (clears) the lowest set bit.
- Resets (clears) the lowest set bit.
- Gets the value of the lowest set bit in a
u32. - Gets the value of the lowest set bit in a
u64. - Zero out all high bits in a
u32starting at the index given. - Zero out all high bits in a
u64starting at the index given. - Bitwise
a & b. - Bitwise
a & b. - Bitwise
a & b. - Bitwise
a & b. - Bitwise
a & b. - Bitwise
a & b. - Bitwise
(!a) & b. - Bitwise
(!a) & b. - Bitwise
(!a) & b. - Bitwise
(!a) & b. - Bitwise
(!a) & b. - Bitwise
(!a) & b. - Bitwise
(!a) & bforu32 - Bitwise
(!a) & bforu64 - Bitwise
a | b. - Bitwise
a | b. - Bitwise
a | b. - Bitwise
a | b. - Bitwise
a | b. - Bitwise
a | b - Bitwise
a ^ b. - Bitwise
a ^ b. - Bitwise
a ^ b. - Bitwise
a ^ b. - Bitwise
a ^ b. - Bitwise
a ^ b. - Blend the
i8lanes according to a runtime varying mask. - Blend
i8lanes according to a runtime varying mask. - Blend the lanes according to a runtime varying mask.
- Blend the lanes according to a runtime varying mask.
- Blend the lanes according to a runtime varying mask.
- Blend the lanes according to a runtime varying mask.
- Swap the bytes of the given 32-bit value.
- Swap the bytes of the given 64-bit value.
- Bit-preserving cast to
m128fromm128d - Bit-preserving cast to
m128fromm128i - Bit-preserving cast to
m128fromm256. - Bit-preserving cast to
m128dfromm128 - Bit-preserving cast to
m128dfromm128i - Bit-preserving cast to
m128dfromm256d. - Bit-preserving cast to
m128ifromm128 - Bit-preserving cast to
m128ifromm128d - Bit-preserving cast to
m128ifromm256i. - Bit-preserving cast to
m256fromm256d. - Bit-preserving cast to
m256fromm256i. - Bit-preserving cast to
m256ifromm256. - Bit-preserving cast to
m256dfromm256i. - Bit-preserving cast to
m256ifromm256. - Bit-preserving cast to
m256ifromm256d. - Round each lane to a whole number, towards positive infinity
- Round the low lane of
btoward positive infinity, other lanesa. - Round each lane to a whole number, towards positive infinity
- Round the low lane of
btoward positive infinity, high lane isa. - Round
f32lanes towards positive infinity. - Round
f64lanes towards positive infinity. - Low lane equality.
- Low lane
f64equal to. - Lanewise
a == bwith lanes asi8. - Compare
i8lanes for equality, mask output. - Lanewise
a == bwith lanes asi16. - Compare
i16lanes for equality, mask output. - Lanewise
a == bwith lanes asi32. - Compare
i32lanes for equality, mask output. - Lanewise
a == bwith lanes asi64. - Compare
i64lanes for equality, mask output. - Lanewise
a == b. - Low lane
a == b, other lanes unchanged. - Lanewise
a == b, mask output. - Low lane
a == b, other lanes unchanged. - Low lane greater than or equal to.
- Low lane
f64greater than or equal to. - Lanewise
a >= b. - Low lane
a >= b, other lanes unchanged. - Lanewise
a >= b. - Low lane
a >= b, other lanes unchanged. - Low lane greater than.
- Low lane
f64greater than. - Lanewise
a > bwith lanes asi8. - Compare
i8lanes fora > b, mask output. - Lanewise
a > bwith lanes asi16. - Compare
i16lanes fora > b, mask output. - Lanewise
a > bwith lanes asi32. - Compare
i32lanes fora > b, mask output. - Lanewise
a > bwith lanes asi64. - Compare
i64lanes fora > b, mask output. - Lanewise
a > b. - Low lane
a > b, other lanes unchanged. - Lanewise
a > b. - Low lane
a > b, other lanes unchanged. - Low lane less than or equal to.
- Low lane
f64less than or equal to. - Lanewise
a <= b. - Low lane
a <= b, other lanes unchanged. - Lanewise
a <= b. - Low lane
a <= b, other lanes unchanged. - Low lane less than.
- Low lane
f64less than. - Lanewise
a < bwith lanes asi8. - Lanewise
a < bwith lanes asi16. - Lanewise
a < bwith lanes asi32. - Lanewise
a < b. - Low lane
a < b, other lanes unchanged. - Lanewise
a < b. - Low lane
a < b, other lane unchanged. - Low lane not equal to.
- Low lane
f64less than. - Lanewise
a != b. - Low lane
a != b, other lanes unchanged. - Lanewise
a != b. - Low lane
a != b, other lane unchanged. - Lanewise
!(a >= b). - Low lane
!(a >= b), other lanes unchanged. - Lanewise
!(a >= b). - Low lane
!(a >= b), other lane unchanged. - Lanewise
!(a > b). - Low lane
!(a > b), other lanes unchanged. - Lanewise
!(a > b). - Low lane
!(a > b), other lane unchanged. - Lanewise
!(a <= b). - Low lane
!(a <= b), other lanes unchanged. - Lanewise
!(a <= b). - Low lane
!(a <= b), other lane unchanged. - Lanewise
!(a < b). - Low lane
!(a < b), other lanes unchanged. - Lanewise
!(a < b). - Low lane
!(a < b), other lane unchanged. - Lanewise
(!a.is_nan()) & (!b.is_nan()). - Low lane
(!a.is_nan()) & (!b.is_nan()), other lanes unchanged. - Lanewise
(!a.is_nan()) & (!b.is_nan()). - Low lane
(!a.is_nan()) & (!b.is_nan()), other lane unchanged. - Lanewise
a.is_nan() | b.is_nan(). - Low lane
a.is_nan() | b.is_nan(), other lanes unchanged. - Lanewise
a.is_nan() | b.is_nan(). - Low lane
a.is_nan() | b.is_nan(), other lane unchanged. - Convert
i32tof32and replace the low lane of the input. - Convert
i32tof64and replace the low lane of the input. - Convert
i64tof64and replace the low lane of the input. - Converts the lower
f32tof64and replace the low lane of the input - Converts the low
f64tof32and replaces the low lane of the input. - Convert the lowest
f32lane to a singlef32. - Convert the lowest
f64lane to a singlef64. - Convert the lower two
i64lanes to twoi32lanes. - Convert the lower eight
i8lanes to eighti16lanes. - Convert
i8values toi16values. - Convert lower 4
u8values toi16values. - Convert lower 8
u8values toi16values. - Convert
u8values toi16values. - Convert the lowest
i32lane to a singlei32. - Convert the lower four
i8lanes to fouri32lanes. - Convert the lower four
i16lanes to fouri32lanes. - Rounds the
f32lanes toi32lanes. - Rounds the two
f64lanes to the low twoi32lanes. - Convert
f64lanes to bei32lanes. - Convert
i16values toi32values. - Convert the lower 8
i8values toi32values. - Convert
f32lanes to bei32lanes. - Convert
u16values toi32values. - Convert the lower two
i8lanes to twoi64lanes. - Convert the lower two
i32lanes to twoi64lanes. - Convert
i32values toi64values. - Convert the lower 4
i8values toi64values. - Convert
i16values toi64values. - Convert
u16values toi64values. - Convert
u32values toi64values. - Rounds the four
i32lanes to fourf32lanes. - Rounds the two
f64lanes to the low twof32lanes. - Convert
f64lanes to bef32lanes. - Rounds the lower two
i32lanes to twof64lanes. - Rounds the two
f64lanes to the low twof32lanes. - Convert
i32lanes to bef32lanes. - Convert
i32lanes to bef64lanes. - Convert
f32lanes to bef64lanes. - Convert the lower eight
u8lanes to eightu16lanes. - Convert the lower four
u8lanes to fouru32lanes. - Convert the lower four
u16lanes to fouru32lanes. - Convert the lower two
u8lanes to twou64lanes. - Convert the lower two
u16lanes to twou64lanes. - Convert the lower two
u32lanes to twou64lanes. - Convert
f64lanes toi32lanes with truncation. - Convert
f32lanes toi32lanes with truncation. - Copy the low
i64lane to a new register, upper bits 0. - Copies the
avalue and replaces the low lane with the lowbvalue. - Accumulates the
u8into a running CRC32 value. - Accumulates the
u16into a running CRC32 value. - Accumulates the
u32into a running CRC32 value. - Accumulates the
u64into a running CRC32 value. - Lanewise
a / b. - Low lane
a / b, other lanes unchanged. - Lanewise
a / b. - Lowest lane
a / b, high lane unchanged. - Lanewise
a / bwithf32. - Lanewise
a / bwithf64. - Duplicate the odd lanes to the even lanes.
- Duplicate the even-indexed lanes to the odd lanes.
- Copy the low lane of the input to both lanes of the output.
- Duplicate the odd lanes to the even lanes.
- Duplicate the odd-indexed lanes to the even lanes.
- Duplicate the odd-indexed lanes to the even lanes.
- Round each lane to a whole number, towards negative infinity
- Round the low lane of
btoward negative infinity, other lanesa. - Round each lane to a whole number, towards negative infinity
- Round the low lane of
btoward negative infinity, high lane isa. - Round
f32lanes towards negative infinity. - Round
f64lanes towards negative infinity. - Lanewise fused
(a * b) + c - Low lane fused
(a * b) + c, other lanes unchanged - Lanewise fused
(a * b) + c - Low lane fused
(a * b) + c, other lanes unchanged - Lanewise fused
(a * b) + c - Lanewise fused
(a * b) + c - Lanewise fused
(a * b) addsub c(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c(adds odd lanes and subtracts even lanes) - Lanewise fused
(a * b) addsub c(adds odd lanes and subtracts even lanes) - Lanewise fused
-(a * b) + c - Low lane
-(a * b) + c, other lanes unchanged. - Lanewise fused
-(a * b) + c - Low lane
-(a * b) + c, other lanes unchanged. - Lanewise fused
-(a * b) + c - Lanewise fused
-(a * b) + c - Lanewise fused
-(a * b) - c - Low lane fused
-(a * b) - c, other lanes unchanged. - Lanewise fused
-(a * b) - c - Low lane fused
-(a * b) - c, other lanes unchanged. - Lanewise fused
-(a * b) - c - Lanewise fused
-(a * b) - c - Lanewise fused
(a * b) - c - Low lane fused
(a * b) - c, other lanes unchanged. - Lanewise fused
(a * b) - c - Low lane fused
(a * b) - c, other lanes unchanged. - Lanewise fused
(a * b) - c - Lanewise fused
(a * b) - c - Lanewise fused
(a * b) subadd c(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c(subtracts odd lanes and adds even lanes) - Lanewise fused
(a * b) subadd c(subtracts odd lanes and adds even lanes) - Gets the low lane as an individual
f32value. - Gets the lower lane as an
f64value. - Converts the low lane to
i32and extracts as an individual value. - Converts the lower lane to an
i32value. - Converts the lower lane to an
i32value. - Converts the lower lane to an
i64value. - Converts the lower lane to an
i64value. - Count the leading zeroes in a
u32. - Count the leading zeroes in a
u64. - Loads the
f32reference into the low lane of the register. - Loads the
f32reference into all lanes of a register. - Load an
f32and splat it to all lanes of anm256d - Loads the reference into the low lane of the register.
- Loads the
f64reference into all lanes of a register. - Load an
f64and splat it to all lanes of anm256d - Loads the low
i64into a register. - Loads the reference into a register.
- Load an
m128and splat it to the lower and upper half of anm256 - Loads the reference into a register.
- Load an
m128dand splat it to the lower and upper half of anm256d - Loads the reference into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Loads the reference given and zeroes any
i32lanes not in the mask. - Loads the reference given and zeroes any
i32lanes not in the mask. - Loads the reference given and zeroes any
i64lanes not in the mask. - Loads the reference given and zeroes any
i64lanes not in the mask. - Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Load data from memory into a register according to a mask.
- Loads the reference into a register, replacing the high lane.
- Loads the reference into a register, replacing the low lane.
- Loads the reference into a register with reversed order.
- Loads the reference into a register with reversed order.
- Load data from memory into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Loads the reference into a register.
- Loads the reference into a register.
- Loads the reference into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Load data from memory into a register.
- Lanewise
max(a, b)with lanes asi8. - Lanewise
max(a, b)with lanes asi8. - Lanewise
max(a, b)with lanes asi16. - Lanewise
max(a, b)with lanes asi16. - Lanewise
max(a, b)with lanes asi32. - Lanewise
max(a, b)with lanes asi32. - Lanewise
max(a, b). - Low lane
max(a, b), other lanes unchanged. - Lanewise
max(a, b). - Low lane
max(a, b), other lanes unchanged. - Lanewise
max(a, b). - Lanewise
max(a, b). - Lanewise
max(a, b)with lanes asu8. - Lanewise
max(a, b)with lanes asu8. - Lanewise
max(a, b)with lanes asu16. - Lanewise
max(a, b)with lanes asu16. - Lanewise
max(a, b)with lanes asu32. - Lanewise
max(a, b)with lanes asu32. - Lanewise
min(a, b)with lanes asi8. - Lanewise
min(a, b)with lanes asi8. - Lanewise
min(a, b)with lanes asi16. - Lanewise
min(a, b)with lanes asi16. - Lanewise
min(a, b)with lanes asi32. - Lanewise
min(a, b)with lanes asi32. - Lanewise
min(a, b). - Low lane
min(a, b), other lanes unchanged. - Lanewise
min(a, b). - Low lane
min(a, b), other lanes unchanged. - Lanewise
min(a, b). - Lanewise
min(a, b). - Min
u16value, position, and other lanes zeroed. - Lanewise
min(a, b)with lanes asu8. - Lanewise
min(a, b)with lanes asu8. - Lanewise
min(a, b)with lanes asu16. - Lanewise
min(a, b)with lanes asu16. - Lanewise
min(a, b)with lanes asu32. - Lanewise
min(a, b)with lanes asu32. - Move the high lanes of
bto the low lanes ofa, other lanes unchanged. - Move the low lanes of
bto the high lanes ofa, other lanes unchanged. - Move the low lane of
btoa, other lanes unchanged. - Gathers the
i8sign bit of each lane. - Gathers the sign bit of each lane.
- Gathers the sign bit of each lane.
- Collects the sign bit of each lane into a 4-bit value.
- Collects the sign bit of each lane into a 4-bit value.
- Create an
i32mask of each sign bit in thei8lanes. - Multiply two
u32, outputting the low bits and storing the high bits in the reference. - Multiply two
u64, outputting the low bits and storing the high bits in the reference. - Multiply
i16lanes producingi32values, horizontal add pairs ofi32values to produce the final output. - Multiply
i16lanes producingi32values, horizontal add pairs ofi32values to produce the final output. - Lanewise
a * bwith lanes asi16, keep the high bits of thei32intermediates. - Multiply the
i16lanes and keep the high half of each 32-bit output. - Lanewise
a * bwith lanes asi16, keep the low bits of thei32intermediates. - Multiply the
i16lanes and keep the low half of each 32-bit output. - Multiply
i16lanes intoi32intermediates, keep the high 18 bits, round by adding 1, right shift by 1. - Multiply
i16lanes intoi32intermediates, keep the high 18 bits, round by adding 1, right shift by 1. - Lanewise
a * bwith lanes asi32, keep the low bits of thei64intermediates. - Multiply the
i32lanes and keep the low half of each 64-bit output. - Multiply the lower
i32within eachi64lane,i64output. - Lanewise
a * b. - Low lane
a * b, other lanes unchanged. - Lanewise
a * b. - Lowest lane
a * b, high lane unchanged. - Lanewise
a * bwithf32lanes. - Lanewise
a * bwithf64lanes. - This is dumb and weird.
- This is dumb and weird.
- Lanewise
a * bwith lanes asu16, keep the high bits of theu32intermediates. - Multiply the
u16lanes and keep the high half of each 32-bit output. - Multiply the lower
u32within eachu64lane,u64output. - Multiplies the odd
i32lanes and gives the widened (i64) results. - Multiplies the odd
u32lanes and gives the widened (u64) results. - Saturating convert
i16toi8, and pack the values. - Saturating convert
i16toi8, and pack the values. - Saturating convert
i16tou8, and pack the values. - Saturating convert
i16tou8, and pack the values. - Saturating convert
i32toi16, and pack the values. - Saturating convert
i32toi16, and pack the values. - Saturating convert
i32tou16, and pack the values. - Saturating convert
i32tou16, and pack the values. - Count the number of bits set within an
i32 - Count the number of bits set within an
i64 - Deposit contiguous low bits from a
u32according to a mask. - Deposit contiguous low bits from a
u64according to a mask. - Extract bits from a
u32according to a mask. - Extract bits from a
u64according to a mask. - Try to obtain a random
u16from the hardware RNG. - Try to obtain a random
u32from the hardware RNG. - Try to obtain a random
u64from the hardware RNG. - Try to obtain a random
u16from the hardware RNG. - Try to obtain a random
u32from the hardware RNG. - Try to obtain a random
u64from the hardware RNG. - Reads the CPU’s timestamp counter value.
- Reads the CPU’s timestamp counter value and store the processor signature.
- Lanewise
1.0 / aapproximation. - Low lane
1.0 / aapproximation, other lanes unchanged. - Reciprocal of
f32lanes. - Lanewise
1.0 / sqrt(a)approximation. - Low lane
1.0 / sqrt(a)approximation, other lanes unchanged. - Reciprocal of
f32lanes. - Sets the args into an
m128i, first arg is the high lane. - Set
i8args into anm256ilane. - Sets the args into an
m128i, first arg is the high lane. - Set
i16args into anm256ilane. - Sets the args into an
m128i, first arg is the high lane. - Set an
i32as the low 32-bit lane of anm128i, other lanes blank. - Set
i32args into anm256ilane. - Sets the args into an
m128i, first arg is the high lane. - Set an
i64as the low 64-bit lane of anm128i, other lanes blank. - Sets the args into an
m128, first arg is the high lane. - Sets the args into an
m128, first arg is the high lane. - Sets the args into an
m128d, first arg is the high lane. - Set
m128dargs into anm256d. - Sets the args into the low lane of a
m128d. - Set
m128iargs into anm256i. - Set
f32args into anm256lane. - Set
f64args into anm256dlane. - Sets the args into an
m128i, first arg is the low lane. - Set
i8args into anm256ilane. - Sets the args into an
m128i, first arg is the low lane. - Set
i16args into anm256ilane. - Sets the args into an
m128i, first arg is the low lane. - Set
i32args into anm256ilane. - Sets the args into an
m128, first arg is the low lane. - Sets the args into an
m128d, first arg is the low lane. - Set
m128dargs into anm256d. - Set
m128iargs into anm256i. - Set
f32args into anm256lane. - Set
f64args into anm256dlane. - Splats the
i8to all lanes of them128i. - Sets the lowest
i8lane of anm128ias all lanes of anm256i. - Splat an
i8arg into anm256ilane. - Splats the
i16to all lanes of them128i. - Sets the lowest
i16lane of anm128ias all lanes of anm256i. - Splat an
i16arg into anm256ilane. - Splats the
i32to all lanes of them128i. - Sets the lowest
i32lane of anm128ias all lanes of anm256i. - Splat an
i32arg into anm256ilane. - Splats the
i64to both lanes of them128i. - Sets the lowest
i64lane of anm128ias all lanes of anm256i. - Splats the value to all lanes.
- Sets the lowest lane of an
m128as all lanes of anm256. - Splats the args into both lanes of the
m128d. - Sets the lowest lane of an
m128das all lanes of anm256d. - Splat an
f32arg into anm256lane. - Splat an
f64arg into anm256dlane. - Shift all
u16lanes to the left by thecountin the loweru64lane. - Lanewise
u16shift left by the loweru64lane ofcount. - Shift all
u32lanes to the left by thecountin the loweru64lane. - Shift all
u32lanes left by the loweru64lane ofcount. - Shift all
u64lanes to the left by thecountin the loweru64lane. - Shift all
u64lanes left by the loweru64lane ofcount. - Shift
u32values to the left bycountbits. - Lanewise
u32shift left by the matchingi32lane incount. - Shift
u64values to the left bycountbits. - Lanewise
u64shift left by the matchingu64lane incount. - Shift each
i16lane to the right by thecountin the loweri64lane. - Lanewise
i16shift right by the loweri64lane ofcount. - Shift each
i32lane to the right by thecountin the loweri64lane. - Lanewise
i32shift right by the loweri64lane ofcount. - Shift each
u16lane to the right by thecountin the loweru64lane. - Lanewise
u16shift right by the loweru64lane ofcount. - Shift each
u32lane to the right by thecountin the loweru64lane. - Lanewise
u32shift right by the loweru64lane ofcount. - Shift each
u64lane to the right by thecountin the loweru64lane. - Lanewise
u64shift right by the loweru64lane ofcount. - Shift
i32values to the right bycountbits. - Lanewise
i32shift right by the matchingi32lane incount. - Shift
u32values to the left bycountbits. - Lanewise
u32shift right by the matchingu32lane incount. - Shift
u64values to the left bycountbits. - Lanewise
u64shift right by the matchingi64lane incount. - Shuffle
f32values inausingi32values inv. - Shuffle
f32values inausingi32values inv. - Shuffle
f64lanes inausing bit 1 of thei64lanes inv - Shuffle
f64lanes inausing bit 1 of thei64lanes inv. - Shuffle
i8lanes inausingi8values inv. - Shuffle
i8lanes inausingi8values inv. - Shuffle
f32lanes inausingi32values inv. - Shuffle
i32lanes inausingi32values inv. - Applies the sign of
i8values inbto the values ina. - Lanewise
a * signum(b)with lanes asi8 - Applies the sign of
i16values inbto the values ina. - Lanewise
a * signum(b)with lanes asi16 - Applies the sign of
i32values inbto the values ina. - Lanewise
a * signum(b)with lanes asi32 - Splat the lowest 8-bit lane across the entire 128 bits.
- Splat the lowest 16-bit lane across the entire 128 bits.
- Splat the lowest 32-bit lane across the entire 128 bits.
- Splat the lowest 64-bit lane across the entire 128 bits.
- Splat the lowest
f32across all four lanes. - Splat the lower
f64across both lanes ofm128d. - Splat the 128-bits across 256-bits.
- Lanewise
sqrt(a). - Low lane
sqrt(a), other lanes unchanged. - Lanewise
sqrt(a). - Low lane
sqrt(b), upper lane is unchanged froma. - Lanewise
sqrtonf64lanes. - Lanewise
sqrtonf64lanes. - Stores the high lane value to the reference given.
- Stores the value to the reference given.
- Stores the value to the reference given.
- Stores the low lane value to the reference given.
- Stores the value to the reference given.
- Stores the low lane value to the reference given.
- Stores the value to the reference given.
- Store data from a register into memory.
- Store data from a register into memory.
- Store data from a register into memory.
- Stores the
i32masked lanes given to the reference. - Stores the
i32masked lanes given to the reference. - Stores the
i32masked lanes given to the reference. - Stores the
i32masked lanes given to the reference. - Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Store data from a register into memory according to a mask.
- Stores the value to the reference given in reverse order.
- Stores the value to the reference given.
- Stores the low lane value to all lanes of the reference given.
- Stores the low lane value to all lanes of the reference given.
- Store data from a register into memory.
- Store data from a register into memory.
- Store data from a register into memory.
- Stores the value to the reference given.
- Stores the value to the reference given.
- Stores the value to the reference given.
- Store data from a register into memory.
- Store data from a register into memory.
- Store data from a register into memory.
- Subtract horizontal pairs of
i16values, pack the outputs asathenb. - Horizontal
a - bwith lanes asi16. - Subtract horizontal pairs of
i32values, pack the outputs asathenb. - Horizontal
a - bwith lanes asi32. - Subtract each lane horizontally, pack the outputs as
athenb. - Subtract each lane horizontally, pack the outputs as
athenb. - Subtract adjacent
f32lanes. - Subtract adjacent
f64lanes. - Subtract horizontal pairs of
i16values, saturating, pack the outputs asathenb. - Horizontal saturating
a - bwith lanes asi16. - Lanewise
a - bwith lanes asi8. - Lanewise
a - bwith lanes asi8. - Lanewise
a - bwith lanes asi16. - Lanewise
a - bwith lanes asi16. - Lanewise
a - bwith lanes asi32. - Lanewise
a - bwith lanes asi32. - Lanewise
a - bwith lanes asi64. - Lanewise
a - bwith lanes asi64. - Lanewise
a - b. - Low lane
a - b, other lanes unchanged. - Lanewise
a - b. - Lowest lane
a - b, high lane unchanged. - Lanewise
a - bwithf32lanes. - Lanewise
a - bwithf64lanes. - Lanewise saturating
a - bwith lanes asi8. - Lanewise saturating
a - bwith lanes asi8. - Lanewise saturating
a - bwith lanes asi16. - Lanewise saturating
a - bwith lanes asi16. - Lanewise saturating
a - bwith lanes asu8. - Lanewise saturating
a - bwith lanes asu8. - Lanewise saturating
a - bwith lanes asu16. - Lanewise saturating
a - bwith lanes asu16. - Compute “sum of
u8absolute differences”. - Compute “sum of
u8absolute differences”. - Tests if all bits are 1.
- Returns if all masked bits are 0,
(a & mask) as u128 == 0 - Returns if, among the masked bits, there’s both 0s and 1s
- Counts the number of trailing zero bits in a
u32. - Counts the number of trailing zero bits in a
u64. - Transpose four
m128as if they were a 4x4 matrix. - Truncate the
f32lanes toi32lanes. - Truncate the
f64lanes to the loweri32lanes (upperi32lanes 0). - Truncate the lower lane into an
i32. - Truncate the lower lane into an
i64. - Unpack and interleave the high lanes.
- Unpack and interleave the high lanes.
- Unpack and interleave high
i8lanes ofaandb. - Unpack and interleave high
i8lanes ofaandb. - Unpack and interleave high
i16lanes ofaandb. - Unpack and interleave high
i16lanes ofaandb. - Unpack and interleave high
i32lanes ofaandb. - Unpack and interleave high
i32lanes ofaandb. - Unpack and interleave high
i64lanes ofaandb. - Unpack and interleave high
i64lanes ofaandb. - Unpack and interleave high lanes of
aandb. - Unpack and interleave high lanes of
aandb. - Unpack and interleave the high lanes.
- Unpack and interleave the high lanes.
- Unpack and interleave low
i8lanes ofaandb. - Unpack and interleave low
i8lanes ofaandb. - Unpack and interleave low
i16lanes ofaandb. - Unpack and interleave low
i16lanes ofaandb. - Unpack and interleave low
i32lanes ofaandb. - Unpack and interleave low
i32lanes ofaandb. - Unpack and interleave low
i64lanes ofaandb. - Unpack and interleave low
i64lanes ofaandb. - Unpack and interleave low lanes of
aandb. - Unpack and interleave low lanes of
aandb. - Zero extend an
m128tom256 - Zero extend an
m128dtom256d - Zero extend an
m128itom256i - All lanes zero.
- Both lanes zero.
- All lanes zero.
- A zeroed
m256 - A zeroed
m256d - A zeroed
m256i