WIP: Implement vectorized l2m on PPC #7909

midronij · 2025-08-26T19:17:34Z

Implement PPC codegen for l2m (Long to Mask) on P8+. This operation accepts eight byte elements of a given boolean array (read from memory using a doubleword load) and converts it into a ShortVector mask with the corresponding boolean values.

midronij · 2025-09-24T14:48:11Z

@gita-omr @zl-wang could you please review when you have a chance?

compiler/p/codegen/OMRTreeEvaluator.cpp

zl-wang · 2025-09-24T18:19:11Z

compiler/p/codegen/OMRTreeEvaluator.cpp

+
+    // move to VRF
+    generateTrg1Src1Instruction(cg, TR::InstOpCode::mtvsrd, node, dstReg, srcReg);
+


why not pre-request that the in-coming double-word is in the right byte order for both LE and BE? and, it can be done very cheaply without heavy lifting below. ld and ldbrx instructions come to mind respectively for BE & LE.

This is a very good point. *2m and m2* are a bit tricky opcodes. Essentially, they are needed for reading/storing a mask from/to a boolean array. We can think if we can combine them with the actual load/store from/to the array but I am sure a few details will need to be worked out.

fyi: @ehsankianifar @BradleyWood

Of course, we can also apply the optimization above if the load is available as the child and has reference count of 1 (as we often do).

@zl-wang please let us know what you think about treating it as an optimization (see above) ?

yes, i agree with the conclusion that this is essentially an codegen optimization, peeking into the children trees in order to decide what best-performing instructions to generate.

@zl-wang @gita-omr I've implemented this optimization for the case where the refcount of the child lload node is 1, and it seems to work without any issues. However, as I understand it, the fix is not quite as simple for the case where refcount is greater than 1, which is less likely to occur but certainly still something we need to take into account.

Earlier we discussed the possibility of essentially un-commoning the lload node, and simply using ldbrx to get the input boolean array without setting the register, but as I understand it, this is a somewhat risky approach to take, and there isn't any past precedent for it anywhere else in the codebase. As well, since there is a way to reverse byte order in a register on P9 and higher (using the xxbrd instruction), the un-commoning approach would only really be relevant to P8 and below.

Since it seems like the conditions in which the un-comming approach would actually be used (refcount >1, P8 and lower) are pretty narrow, is this something we want to pursue? Or alternatively, in the interest of getting these changes merged but still making sure we avoid that cumbersome multi-instruction sequence to manually reverse the element order of the boolean array, maybe it's something we want to add in a different PR?

I agree thatl2mopcode (and similar) are always used with the corresponding scalar load. It's very unlikely that the load is commoned but l2m is not. So it's a very rare situation that, even if addressed, better to be handled in a separate PR.

compiler/p/codegen/OMRTreeEvaluator.cpp

Signed-off-by: midronij <[email protected]>

zl-wang

otherwise, it looks good to me.

zl-wang · 2025-10-16T16:00:38Z

compiler/p/codegen/OMRTreeEvaluator.cpp

+    // Case (1)
+    if (cg->comp()->target().cpu.isLittleEndian() && child->getReferenceCount() == 1 && child->getRegister() == NULL) {
+        srcReg = cg->allocateRegister();
+        TR::LoadStoreHandler::generateLoadNodeSequence(cg, srcReg, child, TR::InstOpCode::ldbrx, 8, true);


you don't need to test if the child node is definitely a memory (load or store) operation? could it be an lregload already, for example?

zl-wang · 2025-10-16T17:10:44Z

compiler/p/codegen/OMRTreeEvaluator.cpp

+    generateTrg1Src2Instruction(cg, TR::InstOpCode::vsubuhm, node, dstReg, tmpReg, dstReg);
+
+    cg->stopUsingRegister(tmpReg);
+    cg->decReferenceCount(child);


since srcReg is possibly not set on child node, decReferenceCount on child node doesn't provide the functionality of managing its liveness. so, you might need to do it here.

0xdaryl added comp:compiler arch:power labels Sep 5, 2025

midronij force-pushed the l2m branch from 909ac85 to ad59316 Compare September 6, 2025 22:21

midronij changed the title ~~WIP: Implement PPC codegen for l2m~~ Implement PPC codegen for l2m Sep 9, 2025

midronij force-pushed the l2m branch 2 times, most recently from 1107bfb to 58fb965 Compare September 9, 2025 19:59

midronij changed the title ~~Implement PPC codegen for l2m~~ WIP: Implement PPC codegen for l2m Sep 12, 2025

midronij changed the title ~~WIP: Implement PPC codegen for l2m~~ WIP: Implement vectorized l2m on PPC Sep 18, 2025

midronij force-pushed the l2m branch from 58fb965 to 67acc6e Compare September 23, 2025 18:36

midronij changed the title ~~WIP: Implement vectorized l2m on PPC~~ Implement vectorized l2m on PPC Sep 23, 2025

zl-wang suggested changes Sep 24, 2025

View reviewed changes

midronij force-pushed the l2m branch 2 times, most recently from b6f49ad to ad5dc2b Compare September 26, 2025 14:23

midronij force-pushed the l2m branch 3 times, most recently from 84e7eb9 to 9560324 Compare October 9, 2025 18:52

gita-omr reviewed Oct 11, 2025

View reviewed changes

compiler/p/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved

midronij force-pushed the l2m branch 2 times, most recently from 32aff9c to 50233d1 Compare October 15, 2025 14:21

Implement PPC codegen for l2m

a1a0239

Signed-off-by: midronij <[email protected]>

midronij force-pushed the l2m branch from 50233d1 to a1a0239 Compare October 15, 2025 14:28

gita-omr approved these changes Oct 15, 2025

View reviewed changes

zl-wang reviewed Oct 16, 2025

View reviewed changes

zl-wang suggested changes Oct 16, 2025

View reviewed changes

midronij changed the title ~~Implement vectorized l2m on PPC~~ WIP: Implement vectorized l2m on PPC Oct 16, 2025


		// move to VRF
		generateTrg1Src1Instruction(cg, TR::InstOpCode::mtvsrd, node, dstReg, srcReg);

WIP: Implement vectorized l2m on PPC #7909

Are you sure you want to change the base?

WIP: Implement vectorized l2m on PPC #7909

Uh oh!

Conversation

midronij commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

midronij commented Sep 24, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

midronij Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zl-wang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

midronij commented Aug 26, 2025 •

edited

Loading

midronij Oct 6, 2025 •

edited

Loading