Z: Implement vector bit compress and expand evaluators#8276
Z: Implement vector bit compress and expand evaluators#8276ehsankianifar wants to merge 1 commit into
Conversation
|
@r30shah could you please review changes in this PR? Thanks |
| { | ||
| return TR::TreeEvaluator::unImpOpEvaluator(node, cg); | ||
| /** | ||
| * Vector Compress Bits Evaluator |
There was a problem hiding this comment.
@ehsankianifar - Can we add doxygen styled comments for this evaluator so that the details you have here (2260-2267) can go there. I may have missed recommending that in other PRs, something just occurred to me. But that would be ideal place.
There was a problem hiding this comment.
Thanks, I moved the comments to the doxygen document
3f23e7a to
715f13b
Compare
|
To explore using existing GPR instructions to accelerate this operation in newer hardware (z17+), I created this issue: |
r30shah
left a comment
There was a problem hiding this comment.
@ehsankianifar - Some nitpicks.
| * \brief | ||
| * Compresses the lane-wise values of the source vector based on a bit mask. | ||
| * | ||
| * \details |
There was a problem hiding this comment.
I do think we can still improve the readability of the comment. Looking at the opcode properties, it clearly states that the implementation is equivalent to the Java's Integer/Long.compress. So I would simplify the comment. The usage of MSB here kind of throws me. I thought it was the way opcodes needs specifically. We could include the example of how it looks on one element in the lane-mask pair.
| // set. | ||
| generateVRSaInstruction(cg, TR::InstOpCode::VERLL, node, sourceReg, sourceReg, generateS390MemoryReference(1, cg), | ||
| elementSizeMask); | ||
| generateVRReInstruction(cg, TR::InstOpCode::VSEL, node, resultReg, sourceReg, resultReg, scratchReg, 0, 0); |
There was a problem hiding this comment.
Add comment -
// Step 3: Conditionally copy source MSB to result if mask bit is set
// VSEL: result = (scratchReg[bit0] == 1) ? workingSource : resultReg
| TR_ASSERT_FATAL_WITH_NODE(node, node->getDataType().getVectorLength() == TR::VectorLength128, | ||
| "Only 128-bit vectors are supported %s", node->getDataType().toString()); | ||
| const uint8_t elementSizeMask = static_cast<uint8_t>(getVectorElementSizeMask(node)); | ||
| uint32_t elementBitNum = getVectorElementSize(node) * 8; |
There was a problem hiding this comment.
We are not changing this variable - Can we use const?
Also to improve readability, I would use const uint32_t msbPosition = elementBitNum - 1; and use it in the instruction - this improves readability of the code.
| cg->evaluate(maskChild), 0, 0); | ||
| cg->decReferenceCount(maskChild); | ||
| } | ||
|
|
There was a problem hiding this comment.
TBH - this is the complex implementation but may be the only way we can achieve this operation ? We will have loop iteration that is controlled by the element size in number of bits. So worst case it executes 64 loop iteration for Long. Also in each iteration we have 5 vector operations. I do understand that we want this operation to work on all platform, but I really think that you should consider using bit extract and deposit instruction where it is available.
Also
There was a problem hiding this comment.
I am already working on that but I prefer to have it in a different PR so I can test it properly on the new hardware.
| * Expands the lane-wise values of the source vector based on a bit mask. | ||
| * | ||
| * \details | ||
| * The bits from the source are distributed to positions where the mask has set bits. |
There was a problem hiding this comment.
Similar to compressbits, comment should be made simpler for expand.
| * \return | ||
| * TR::Register with the compressed values. | ||
| */ | ||
| TR::Register *OMR::Z::TreeEvaluator::vexpandbitsEvaluator(TR::Node *node, TR::CodeGenerator *cg) |
There was a problem hiding this comment.
Same feedback as compressBits for comments.
e86bf28 to
b18646e
Compare
|
@r30shah I addressed all the comments. could you please take another look? thanks. |
r30shah
left a comment
There was a problem hiding this comment.
Last nitpicks, but overall looks good.
| * Compresses the lane-wise values of the source vector based on a bit mask. | ||
| * | ||
| * \details | ||
| * Performs lanewise compression similar to integer/long.compress() |
There was a problem hiding this comment.
Nit-pick Integer/Long.compress.
| * Expands the lane-wise values of the source vector based on a bit mask. | ||
| * | ||
| * \details | ||
| * Performs lanewise expansion similar to scalar integer/long.expand() |
There was a problem hiding this comment.
Nit - Integer/Long.expand()
Implement vcompressbitsEvaluator, vmcompressbitsEvaluator, vexpandbitsEvaluator, and vmexpandbitsEvaluator on IBM Z Platform. signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
b18646e to
cc7c116
Compare
|
Thanks @r30shah for your review. I addressed the last change requests. Let me know if it looks good to you. |
Implement vcompressbitsEvaluator, vmcompressbitsEvaluator, vexpandbitsEvaluator, and vmexpandbitsEvaluator on IBM Z Platform.