-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aarch64 - update EqDbl to use SIMD/FP instructions #7914
base: master
Are you sure you want to change the base?
Conversation
@mxw - Hi Max, Can you take a look at this one? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have the one comment. I'll leave it to one of @cmuellner, @dave-estes, @jim-saxman, and @apinski-cavium to vet the meat of the PR.
hphp/runtime/vm/jit/vasm-arm.cpp
Outdated
@@ -1545,7 +1533,7 @@ void lower(const VLS& e, movtdb& i, Vlabel b, size_t z) { | |||
lower_impl(e.unit, b, z, [&] (Vout& v) { | |||
auto d = v.makeReg(); | |||
v << copy{i.s, d}; | |||
v << movtqb{d, i.d}; | |||
v << copy{d, i.d}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of copying through a tmp reg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxw - The code generator wouldn't accept it without the extra copy. Something to do with changing register classes I suspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the register widths are propagated through copies, and they already don't match since this is a truncation operation...
Can you paste the exact error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxw - Looks like I didn't capture that in my daily log. I'll have to try and recreate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxw - I was unable to recreate the issue. I'll reduce the copy and retest.
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Hi @swalk-cavium, This patch doesn't create any more unit test failures, and it passed all of OSS performance test suite. |
@swalk-cavium updated the pull request - view changes - changes since last import |
@mxw - MOP results the same with only 1 copy, so updated pull request. |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Before I accept this, I'd like someone else to give a second opinion on the change (not just on test results) to confirm it doesn't affect the behavior of the instruction in any way. |
@mxw - Hi Max, Here's how I tested the sequences before incorporating in hhvm |
@mxw - Here's the table I used for the unordered comparison case. SNaN - signalling NaN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slick optimization. It's not super-intuitive, but a few comments will sort that out quickly.
auto d = v.makeReg(); | ||
v << copy{i.s, d}; | ||
v << movtqb{d, i.d}; | ||
v << copy{i.s, i.d}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to do this if we end up reverting f9ecdf1? Might want to split this change out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dave-estes - Hi Dave, I'm not sure. If your new version goes in first, I'll have to retest.
hphp/vixl/a64/assembler-a64.cc
Outdated
@@ -1352,6 +1352,22 @@ void Assembler::frintz(const FPRegister& fd, | |||
} | |||
|
|||
|
|||
void Assembler::fcmeq(const FPRegister& fd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more readable if this was named Assembler::fcmeqz()
.
@@ -909,29 +909,17 @@ void Vgen::emit(const unpcklpd& i) { | |||
/////////////////////////////////////////////////////////////////////////////// | |||
|
|||
void Vgen::emit(const cmpsd& i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think renaming the 2 argument fcmeq
(see next comment) will help, but I also think a line or two of comments will make this more clear too.
@mxw - I think I might have to update this to make it check the hardware capabilities. I just noticed at least one of the forms says, ARMv8.2. Something similar to what we did for the LSE Atomics. |
@mxw - Um, disregard the previous comment. SIMD instructions are required in the |
@swalk-cavium updated the pull request - view changes - changes since last import |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@swalk-cavium—Cool, thanks for this change. I'll land it as soon as internal tests pass. |
@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@mxw - Hi Max, Any word on your testing infrastructure issue? Can this one land? |
This change updates the EqDbl and NeqDbl sequences to use different instructions.
It reduces the number of instructions needed and removes the dependency on the
status register.
Before
After
This also updates the vixl disassembler to recognize two new instruction formats:
Advanced SIMD Scalar Three Same, and Advanced SIMD Scalar two-register Misc.
Whenever the enumeration clashed with an existing definition a prefix was added.
See STS, and STM respectively.
The regression suite was run with 6 option sets. No additional failures were observed.