Skip to content

Conversation

@usamoi
Copy link

@usamoi usamoi commented Dec 14, 2025

Used to implement the _mm512_shuffle_epi8 intrinsic.

@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

Thank you for contributing to Miri! A reviewer will take a look at your PR, typically within a week or two.
Please remember to not force-push to the PR branch except when you need to rebase due to a conflict or when the reviewer asks you for it.

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Dec 14, 2025
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Since this just generalizes existing operations, this seems reasonable. I may become a bit hesitant if we start to add avx512-exclusive operations...

View changes since this review

/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8>
/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi8>
/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_shuffle_epi8>
fn pshufb<'tcx>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation of what the types here are? Are these all u8 vectors?

Copy link
Author

@usamoi usamoi Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has currently been changed to:

/// Shuffles bytes from `left` using `right` as pattern.
///
/// `left` and `right` are both vectors of type `len` x i8. Only bits 0, 1, 2, 3 and 7 of each byte of
/// `right` matter; if bit 7 of each byte of `right` is set, the value of `dest` at the corresponding
/// byte will be set to 0.
///
/// Each 128-bit block is shuffled independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also say what the other 4 bits do then. I guess it's something like this:
The first four bytes of right at index i indicate which of the left values from the same 16-element block is used for index i in dest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's

/// Shuffles bytes from `left` using `right` as pattern. Each 16-byte block is shuffled independently.
///
/// `left` and `right` are both vectors of type `len` x i8.
///
/// If the highest bit of a byte in `right` is not set, the corresponding byte in `dest` is taken from
/// same 16-byte block of `left` at the position indicated by the lowest 4 bits of this byte in `right`.
/// If the highest bit of a byte in `right` is set, the corresponding byte in `dest` is set to `0`.


let res = if right & 0x80 == 0 {
// Shuffle each 128-bit (16-byte) block independently.
let j = u64::from(right % 16).strict_add(i & !15);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the i & !15 here the same as i / 16? If yes, think that would be more clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's i / 16 * 16.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

Suggested change
let j = u64::from(right % 16).strict_add(i & !15);
let block_start = i & !15; // round down to previous multiple of 16
let j = block_start.strict_add((right % 16).into());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check this on real hardware?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've tested intrinsics-x86-avx512.rs on real hardware. I also ran miri on my real cases.

@rustbot rustbot removed the S-waiting-on-review Status: Waiting for a review to complete label Dec 14, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Dec 14, 2025
@usamoi
Copy link
Author

usamoi commented Dec 15, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Dec 15, 2025
Comment on lines 155 to 158
let b = _mm512_set_epi8(-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using some indices other than 1 also seems like a good idea. In particular, please ensure the "wrap-around" is tested by also checking 127.

Why does index 1 read "14" for the first block? It seems to be indexing from the right...?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does index 1 read "14" for the first block?

_mm512_set_epi8 sets the elements of a vector in reverse order. To set the elements in forward order, _mm512_setr_epi8 should be used.

@RalfJung
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Dec 15, 2025
@usamoi
Copy link
Author

usamoi commented Dec 15, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Waiting for a review to complete

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants