Skip to content

[AIE] Enable HL by default in AIEOuterLoopPipeliner#1006

Merged
andcarminati merged 3 commits into
aie-publicfrom
andreu.outerloop.jnzd
May 21, 2026
Merged

[AIE] Enable HL by default in AIEOuterLoopPipeliner#1006
andcarminati merged 3 commits into
aie-publicfrom
andreu.outerloop.jnzd

Conversation

@andcarminati
Copy link
Copy Markdown
Collaborator

JNZD is easier to fit in a schedule, as we are now scheduling epilogues in top-down to the end.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we heavily rely on the pre-condition that it is a downcounting loop. Perhaps mention that in a comment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this underflow?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check min iter count. To underflow we need to have wrong metadata here, not our fault.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why we don't use 'prepare-for-pipelining' in the regular hardware loop creation. The only reason I can find is that at this point we are actually pipelining the outer loop. Is there a path that can stop outer loop pipelining for a loop with the pragma, and if so, can we predict that in hardware loop creation itself?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan to transfer the responsibility to the original hardware loop pass in the future.

Copy link
Copy Markdown
Collaborator

@martien-de-jong martien-de-jong May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could cache the anchors needed for replacement in a match structure that you return.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make sense.

@andcarminati andcarminati force-pushed the andreu.outerloop.jnzd branch 2 times, most recently from 3717eb9 to 73e6dd0 Compare May 21, 2026 10:23
// canonical downcounting icmp pattern that can be replaced by
// @llvm.loop.decrement.reg, or std::nullopt otherwise.
std::optional<DowncountingInfo>
getDowncountingInfo(const LoopStructure &LS) const;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, danke sch"on!

}
// Determine the induction step. If the counter is an add with a constant
// operand, extract the step directly (handles any step magnitude).
// Otherwise, infer a unit step from the comparison predicate.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the 'otherwise' path. Moreover, that sounds like another implicit precondition.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, see the last commit, we removed any implicit preconditions. Not we have canAdjustLoopBound that parses everything that is necessary just once, or rejects before any transformation.

unsigned LimitIdx; // index of Limit in Cmp's operand list
int64_t Step; // induction step (from add const, or +=1 inferred)
BinaryOperator *Counter; // add instr feeding the icmp (null for plain PHI)
PHINode *OldIV; // counting PHI feeding Counter (null if not found)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking this.

// getDowncountingInfo) read from LS.LatchBound without re-scanning.
bool AIEOuterLoopPipeliner::canAdjustLoopBound(LoopStructure &LS) const {
if (LS.LatchBound)
return true; // Already computed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: We have no way to see whether we had a total rejection earlier, but I assume we wouldn't be here. But then I wonder whether we could assert that it has been computed.

JNZD is easier to fit in a schedule, as we are now scheduling epilogues
in top-down to the end.
…t step

Two correctness issues in the outer loop pipeliner:

* getDowncountingInfo(): accepted any negative step before emitting
  @llvm.loop.decrement.reg, which always decrements by 1. A loop with
  step -2 would be converted to a hardware loop that runs twice as many
  iterations as intended.

  Fix: replace isNegative() with isMinusOne(). JNZD hardware loop
  conversion remains restricted to step -1 since @llvm.loop.decrement.reg
  decrements by exactly 1.

* adjustLoopBound(): used isNegative() only to determine direction, then
  unconditionally adjusted the icmp limit by ±1. For a loop with step -k,
  the limit must be adjusted by k (not 1) to correctly remove one
  iteration; adjusting by 1 for step -2 can produce the wrong exit
  condition or an infinite loop.

  Fix: generalize to the unified formula NewLimit = Limit - Step. The
  step is extracted from the constant operand of the add instruction
  feeding the icmp counter (any magnitude). For plain-PHI counters the
  step is inferred from the predicate (SLT/ULT → +1, SGT/UGT → -1).
  EQ/NE without a visible constant step returns nullptr. This correctly
  handles increment and decrement loops with arbitrary constant steps.
.
Three functions independently scanned the latch icmp pattern:
adjustLoopBound(), getDowncountingInfo(), and the inline checks that
were previously spread across isProfitableToRotate(). Each repeated the
same work: find the conditional branch, extract the icmp, identify the
loop-invariant limit and non-invariant counter, determine the induction
step, and look up the counting PHI.

Introduce LatchBoundInfo to hold all components discovered from the latch
exit condition (icmp, limit, limit operand index, step, counter add,
counting PHI), and add a LatchBound field to LoopStructure to cache it.

Add canAdjustLoopBound() which performs the single authoritative scan,
populates LS.LatchBound, and returns true if the bound is adjustable
(step != 0). It is called from isProfitableToRotate() before any IR
modification; subsequent functions reuse the cached result:

* adjustLoopBound(): reads {Cmp, Limit, LimitIdx, Step} from LatchBound
  and emits CreateSub(Limit, Step) — no scanning.

* getDowncountingInfo(): reads LatchBound and returns DowncountingInfo
  if Step == -1, Counter, and OldIV are all present — no scanning.

The latch icmp pattern-matching work is now done exactly once.
@andcarminati andcarminati enabled auto-merge (rebase) May 21, 2026 12:37
@andcarminati andcarminati merged commit 9e603b7 into aie-public May 21, 2026
7 checks passed
@andcarminati andcarminati deleted the andreu.outerloop.jnzd branch May 21, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants