[AIE] Enable HL by default in AIEOuterLoopPipeliner#1006
Conversation
There was a problem hiding this comment.
I think we heavily rely on the pre-condition that it is a downcounting loop. Perhaps mention that in a comment.
There was a problem hiding this comment.
nit: could this underflow?
There was a problem hiding this comment.
We check min iter count. To underflow we need to have wrong metadata here, not our fault.
There was a problem hiding this comment.
I'm wondering why we don't use 'prepare-for-pipelining' in the regular hardware loop creation. The only reason I can find is that at this point we are actually pipelining the outer loop. Is there a path that can stop outer loop pipelining for a loop with the pragma, and if so, can we predict that in hardware loop creation itself?
There was a problem hiding this comment.
Yes, I plan to transfer the responsibility to the original hardware loop pass in the future.
There was a problem hiding this comment.
Nit: you could cache the anchors needed for replacement in a match structure that you return.
There was a problem hiding this comment.
It make sense.
3717eb9 to
73e6dd0
Compare
| // canonical downcounting icmp pattern that can be replaced by | ||
| // @llvm.loop.decrement.reg, or std::nullopt otherwise. | ||
| std::optional<DowncountingInfo> | ||
| getDowncountingInfo(const LoopStructure &LS) const; |
There was a problem hiding this comment.
Oh, danke sch"on!
| } | ||
| // Determine the induction step. If the counter is an add with a constant | ||
| // operand, extract the step directly (handles any step magnitude). | ||
| // Otherwise, infer a unit step from the comparison predicate. |
There was a problem hiding this comment.
I don't see the 'otherwise' path. Moreover, that sounds like another implicit precondition.
There was a problem hiding this comment.
Please, see the last commit, we removed any implicit preconditions. Not we have canAdjustLoopBound that parses everything that is necessary just once, or rejects before any transformation.
| unsigned LimitIdx; // index of Limit in Cmp's operand list | ||
| int64_t Step; // induction step (from add const, or +=1 inferred) | ||
| BinaryOperator *Counter; // add instr feeding the icmp (null for plain PHI) | ||
| PHINode *OldIV; // counting PHI feeding Counter (null if not found) |
There was a problem hiding this comment.
Checking this.
| // getDowncountingInfo) read from LS.LatchBound without re-scanning. | ||
| bool AIEOuterLoopPipeliner::canAdjustLoopBound(LoopStructure &LS) const { | ||
| if (LS.LatchBound) | ||
| return true; // Already computed. |
There was a problem hiding this comment.
CHECK: We have no way to see whether we had a total rejection earlier, but I assume we wouldn't be here. But then I wonder whether we could assert that it has been computed.
JNZD is easier to fit in a schedule, as we are now scheduling epilogues in top-down to the end.
…t step Two correctness issues in the outer loop pipeliner: * getDowncountingInfo(): accepted any negative step before emitting @llvm.loop.decrement.reg, which always decrements by 1. A loop with step -2 would be converted to a hardware loop that runs twice as many iterations as intended. Fix: replace isNegative() with isMinusOne(). JNZD hardware loop conversion remains restricted to step -1 since @llvm.loop.decrement.reg decrements by exactly 1. * adjustLoopBound(): used isNegative() only to determine direction, then unconditionally adjusted the icmp limit by ±1. For a loop with step -k, the limit must be adjusted by k (not 1) to correctly remove one iteration; adjusting by 1 for step -2 can produce the wrong exit condition or an infinite loop. Fix: generalize to the unified formula NewLimit = Limit - Step. The step is extracted from the constant operand of the add instruction feeding the icmp counter (any magnitude). For plain-PHI counters the step is inferred from the predicate (SLT/ULT → +1, SGT/UGT → -1). EQ/NE without a visible constant step returns nullptr. This correctly handles increment and decrement loops with arbitrary constant steps. .
Three functions independently scanned the latch icmp pattern:
adjustLoopBound(), getDowncountingInfo(), and the inline checks that
were previously spread across isProfitableToRotate(). Each repeated the
same work: find the conditional branch, extract the icmp, identify the
loop-invariant limit and non-invariant counter, determine the induction
step, and look up the counting PHI.
Introduce LatchBoundInfo to hold all components discovered from the latch
exit condition (icmp, limit, limit operand index, step, counter add,
counting PHI), and add a LatchBound field to LoopStructure to cache it.
Add canAdjustLoopBound() which performs the single authoritative scan,
populates LS.LatchBound, and returns true if the bound is adjustable
(step != 0). It is called from isProfitableToRotate() before any IR
modification; subsequent functions reuse the cached result:
* adjustLoopBound(): reads {Cmp, Limit, LimitIdx, Step} from LatchBound
and emits CreateSub(Limit, Step) — no scanning.
* getDowncountingInfo(): reads LatchBound and returns DowncountingInfo
if Step == -1, Counter, and OldIV are all present — no scanning.
The latch icmp pattern-matching work is now done exactly once.
73e6dd0 to
9f2fbe3
Compare
JNZD is easier to fit in a schedule, as we are now scheduling epilogues in top-down to the end.