-
Notifications
You must be signed in to change notification settings - Fork 97
Remove Simd: Index
#498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Remove Simd: Index
#498
Conversation
78e6c6b to
5be2886
Compare
|
Oh, I thought we already added |
5be2886 to
9cdcec4
Compare
|
@workingjubilee I was surprised too, TBH :P I was expecting to say "oh, don't use |
|
Looks good to me. I think the |
a35167d to
72cc10d
Compare
They are, but helpfully they have fallback mir so they'll get working implementation for free, and that way so long as they've been in nightly long enough we don't need to worry about things like going and implementing them in cg_clif.
I'm surprised it doesn't, for what seems like a simple case, but overall I think discouraging address versions of things is good anyway -- we'll still have |
72cc10d to
3bbbc4f
Compare
| /// `idx` must be in-bounds (`idx < N`) | ||
| #[inline] | ||
| unsafe fn get_inner(self, idx: usize) -> T { | ||
| // FIXME: This is a workaround for a CI failure in #498 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what the problem was, but LLVM was very unhappy for some reason
rustc-LLVM ERROR: Cannot select: 0x7ff0c871e2a0: v16i8 = setcc 0x7ff0c871ee00, 0x7ff0ca14b000, setne:ch
0x7ff0c871ee00: v8i16,ch = CopyFromReg 0x7ff0c871e8c0:1, Register:v8i16 %21
0x7ff0ca14b000: v8i16 = xor 0x7ff0c8720a80, 0x7ff0c871db60
0x7ff0c8720a80: v8i16,ch = CopyFromReg 0x7ff0c871e070:1, Register:v8i16 %11
0x7ff0c871db60: v8i16 = splat_vector Constant:i32<-1>
|
Thanks for following up on our Discord conversation! Is there any chance that |
imo |
|
I couldn't reach the discord for the discussion but we also use the indexing a bit. However, it's usually in a scalar context where we are preparing or inspecting a data structure before or after any hot loop work. Messing with individual elements at a time always seemed to be for convenience not performance to me, but switching to get/set makes sense if this is a footgun. Happy to benchmark this branch on Apple M4 if you need more data with a relevant function. |
both passing by value and by |
I would rather we not rely on this optimization pass, as I have been looking into compiler improvements which remove this requirement. |
by that do you mean changing the function call ABI to pass values in registers, or changing rustc to not store temporary values in |
|
Yes, both? Fucking hell. |
|
I don't see how anything that is "In My Opinion" matters anyways unless you have an opinion in the exact shape of certain optimization passes OR are willing to write an optimization pass in the shape of your opinions. We should be trying to get results not speculation. |
Nice! I wasn't expecting rustc to have a major overhaul in how it represents variables and stuff, since afaik it currently just generates an alloca for everything and relies on llvm to convert to ssa form. In that case, we should probably pass by value wherever we can. |
|
Ah. We actually do represent scalars "directly" in LLVMIR. I was primarily looking to make the function-crossing of vectors be represented directly in LLVMIR (where ABI compatible). That was my main goal, as it enables optimizations on its own. Adjusting things to represent vectors directly in the intra-function IR is also possible, easier than it might sound, and kind of almost required once you have vectors pass directly. Basically just a lot of things have not been thoroughly examined and are only a few small touches away. The main thing is just that previous attempts to improve things have not been thoughtful in how they adjusted, more like blind stabs in the dark. They wind up requiring reverts too often. I'm trying to at least get a candle, maybe a few, shining on the issue, so that I can make something stickier. |
(Inspired by this conversation on Discord https://discord.com/channels/273534239310479360/592856094527848449/1452407229843247197)
This PR removes the
Simd<_, _>: IndexandSimd<_, _>: IndexMuttrait implementations, and instead addsgetandsetmethods.IMHO this is a good idea as
v[1] = 2;looks innocuous but is actually surprisingly bad for codegen, as it forces the vector to be in-memory in order to return the&T-- and there's no way to return aTfromIndex. One can alwaysv.as_mut_array()[1] = 2;if you want (that still works) but that extra speed bump is, I posit, a good thing to help push people to better methods.What are those better methods? Well, this also adds
Simd::setandSimd::getusing thesimd_{insert,extract}_dynintrinsics internally. Thus instead oflet x = v[i];, you'd writelet x = v.get(i);. Doing that solves the optimization problem that the person was asking about on discord, and seeing that it did fix it (and that there's no use ofsimd_extract_dynin the current library) is why I went and made this PR 🙂Discussion topics:
Index(Mut)be left there for a bit, to help smooth the transition? Can we even markimpls as deprecated?set/getfor shortness, but maybe something domain-specific would be better. I considered usinginsert/extract, but https://doc.rust-lang.org/std/simd/struct.Simd.html#method.extract already exists. (It could beextract_vectoror something, though.)setbe&mut selforself? I noticed that https://doc.rust-lang.org/std/simd/struct.Mask.html#method.set is&mut self, but using&mut selfalso forces it through memory in codegen, albeit in a way much easier for the optimizer to remove.