-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
I'm trying to get into SIMD by implementing a trivial operation: XOR unmasking of a byte stream as required by the WebSocket specification. The implementation in x86 intrinsics is actually very straightforward, but I have a hard time wrapping my head around expressing it in terms of Faster iterators API.
The part I'm having trouble with is getting an input [u8; 4] to cycle within a SIMD vector of u8. I have looked at:
load()which does accept&[u8]as input, but its behavior in case of length mismatch is completely undocumented. It's also not obvious whatoffsetparameter does.- Casting the input
[u8; 4]tou32, callingvecs::u32s()and then downcasting repeatedly to get a SIMD vector of u8, but Downcast seems to do not at all what I want. - Getting a SIMD vector of length 4 and arbitrary type inside it, load
[u8; 4]into it (lengths now match, so it should work) then downcast repeatedly until I get a vector of u8 with arbitrary length. Except there seems to be no way to request a SIMD vector of length 4 and arbitrary type. - After over an hour of head-scratching I've noticed that
From<u32x4>is implemented foru8x16, so I could replace Downcast with it in approach 2 and probably get the correct result, except I have no idea how such conversions interact with host endianness.
I actually expected this to be a trivial task. I guess for someone familiar with SIMD it is, but for the likes of me a snippet in examples/ folder that loads [u8; 4] into a vector would go a long way. Or perhaps even a convenience function in the API that deals with endianness properly, to make it harder to mess up.
Metadata
Metadata
Assignees
Labels
No labels