-
Notifications
You must be signed in to change notification settings - Fork 417
add avx512 pshufb #4764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add avx512 pshufb #4764
Conversation
|
Thank you for contributing to Miri! A reviewer will take a look at your PR, typically within a week or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Since this just generalizes existing operations, this seems reasonable. I may become a bit hesitant if we start to add avx512-exclusive operations...
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8> | ||
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi8> | ||
| /// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_shuffle_epi8> | ||
| fn pshufb<'tcx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an explanation of what the types here are? Are these all u8 vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has currently been changed to:
/// Shuffles bytes from `left` using `right` as pattern.
///
/// `left` and `right` are both vectors of type `len` x i8. Only bits 0, 1, 2, 3 and 7 of each byte of
/// `right` matter; if bit 7 of each byte of `right` is set, the value of `dest` at the corresponding
/// byte will be set to 0.
///
/// Each 128-bit block is shuffled independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also say what the other 4 bits do then. I guess it's something like this:
The first four bytes of right at index i indicate which of the left values from the same 16-element block is used for index i in dest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's
/// Shuffles bytes from `left` using `right` as pattern. Each 16-byte block is shuffled independently.
///
/// `left` and `right` are both vectors of type `len` x i8.
///
/// If the highest bit of a byte in `right` is not set, the corresponding byte in `dest` is taken from
/// same 16-byte block of `left` at the position indicated by the lowest 4 bits of this byte in `right`.
/// If the highest bit of a byte in `right` is set, the corresponding byte in `dest` is set to `0`.
src/shims/x86/mod.rs
Outdated
|
|
||
| let res = if right & 0x80 == 0 { | ||
| // Shuffle each 128-bit (16-byte) block independently. | ||
| let j = u64::from(right % 16).strict_add(i & !15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the i & !15 here the same as i / 16? If yes, think that would be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's i / 16 * 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right.
| let j = u64::from(right % 16).strict_add(i & !15); | |
| let block_start = i & !15; // round down to previous multiple of 16 | |
| let j = block_start.strict_add((right % 16).into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check this on real hardware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I've tested intrinsics-x86-avx512.rs on real hardware. I also ran miri on my real cases.
|
Reminder, once the PR becomes ready for a review, use |
|
@rustbot ready |
| let b = _mm512_set_epi8(-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | ||
| -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | ||
| -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | ||
| -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using some indices other than 1 also seems like a good idea. In particular, please ensure the "wrap-around" is tested by also checking 127.
Why does index 1 read "14" for the first block? It seems to be indexing from the right...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does index 1 read "14" for the first block?
_mm512_set_epi8 sets the elements of a vector in reverse order. To set the elements in forward order, _mm512_setr_epi8 should be used.
|
@rustbot author |
|
@rustbot ready |
Used to implement the _mm512_shuffle_epi8 intrinsic.