Skip to content

Add BLAS trsv derivative rule#2828

Open
jlperla wants to merge 1 commit into
EnzymeAD:mainfrom
jlperla:trsv-rule
Open

Add BLAS trsv derivative rule#2828
jlperla wants to merge 1 commit into
EnzymeAD:mainfrom
jlperla:trsv-rule

Conversation

@jlperla
Copy link
Copy Markdown

@jlperla jlperla commented May 16, 2026

Adds the def trsv rule for the real BLAS triangular vector solve, with both forward and reverse mode derivatives. Split out of #2825 per @wsmoses's request to land per-rule PRs with minimal infrastructure churn.

What's added

  • def trsv in BlasDerivatives.td, following the same reverse-mode shape as trtrs (single RHS vector instead of matrix). Forward mode is a 5-call Seq using copy/trmv/axpy/axpy/trsv.
  • def Dep : MagicInst; (1 line) plus a 6-line passthrough in rev_call_arg. The forward rule needs to gate two axpy calls and one copy on Shadow $A activity, but Shadow $A is not a direct BLAS argument of those calls. Dep wraps an arg so if_rule_condition_inner sees the activity dependency without emitting the shadow as a real BLAS argument. No emit_dag handler is needed — Dep only appears nested inside BlasCall args, and if_rule_condition_inner already recurses through arbitrary DagInits transparently.
  • trsv joins trmv/trtrs in the n=3 byRef-suffix arity table (three trailing Fortran string-length args for uplo/trans/diag) and in the reversed activeArg iteration order (reverse mode must solve the x cotangent before forming dA, identical dependency shape to trtrs).

What's intentionally not here

  • No changes to enzyme/Enzyme/Utils.cpp or Utils.h.
  • No changes to trsm, potrf, potrs, or the uplo_to_*/side_to_* matchers.
  • No mat_ld / MatAdd / side_square machinery — those belong to follow-up PRs for trsm/potrs.

Tests

  • New: test/Enzyme/ForwardMode/blas/trsv_f.ll, test/Enzyme/ReverseMode/blas/trsv_f.ll
  • Locally validated against LLVM 16: ninja LLVMEnzyme-16, lit -sv test/Enzyme (1002 passed, 10 expected-fail; no regressions).

Follow-ups will add trsm, potrs, and forward-mode rules for potrf/etc. as separate PRs.

Adds the `def trsv` rule for the real triangular vector solve, with both
forward and reverse mode derivatives. Mirrors the structure of `trtrs`.

Outside the new `def`, the only required tablegen additions are:

- `def Dep : MagicInst;` with a 6-line passthrough in `rev_call_arg`.
  The forward rule needs to gate two `axpy` and one `copy` call on
  `Shadow $A` being active, but `Shadow $A` is not a direct BLAS arg of
  those calls. `Dep` wraps an arg so activity analysis sees the
  dependency without it being emitted as a real BLAS argument.
- `trsv` joins `trmv`/`trtrs` in the n=3 byRef-suffix arity table
  (three trailing string-length args for uplo/trans/diag).
- `trsv` joins `trtrs` in the reversed activeArg iteration order, since
  reverse mode needs the solved x cotangent before forming dA.

No changes to Utils.cpp/Utils.h or to other BLAS rules.
def Concat : MagicInst;

def ShadowNoInc : MagicInst;
def Dep : MagicInst;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the heck is dep?

]
,
(Seq<["tmp", "vector", "n"], [], 0>
(BlasCall<"copy"> $n, (Dep (Shadow $A), $x), use<"tmp">, ConstantInt<1>),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think dep should be necessary here at all [and you should delete the concept?]

@jlperla
Copy link
Copy Markdown
Author

jlperla commented May 16, 2026

Thanks for the look. To answer the direct question: Dep wraps a sub-expression so the activity-guard pass (if_rule_condition_inner) sees the wrapped Shadow when computing the if (d_X) guard for a BlasCall, while rev_call_arg passes through to the actual emitted argument (the 2nd Dep arg). It is purely a no-op at the BLAS call level — it does not affect the emitted argument, only the guard.

I tried removing it and the codegen breaks. The forward rule's Seq has a copy step that populates a scratch buffer with $x (primal), then trmv writes (Shadow $A) * x into the scratch, then two axpys consume it. Without Dep, the copy and the first axpy have no Shadow anywhere in their args, so:

  • copy $n, $x, use<"tmp">, ConstantInt<1> generates literally if () { ... } (empty condition, build error at BlasDerivatives.inc:17493).
  • Even if I patched the codegen to emit if (true) on empty guards, axpy $n, -1, use<"tmp">, ..., (Shadow $x) would be gated only on active_x. With active_x=true, active_A=false, the trmv step (correctly gated on active_A) is skipped, but the axpy runs and subtracts the unwritten/primal-x value from dx. Wrong math.

The forward JVP for trsv is dx ← trsv(A, dx - dA·x). The intermediate steps fundamentally depend on dA being active, but dA is not a direct argument of the steps that consume it via the scratch buffer — that's exactly the mismatch Dep is meant to bridge. I don't see another mechanism in the current tablegen that handles this case (none of Lookup, First, ShadowNoInc, Concat inject names into the activity-seen set), and trtrs has no precedent here since it has no forward rule.

A few options I can take, whichever you prefer:

  1. Keep Dep as-is in this PR (3 wraps in the forward rule + 6-line rev_call_arg passthrough). Genuinely the smallest mechanism I can find for the forward triangular-solve activity gating, and it stays opt-in (no other rule touches it).
  2. Drop the forward rule from this PR and land reverse-only now, matching trtrs exactly. The forward rule (and the Dep discussion) can go in a separate follow-up where a cleaner mechanism can be designed.
  3. Move the forward rule to hand-written C++ (out of tablegen entirely), if that's preferred for the "complex" cases.

I'm happy to do (2) immediately if you want me to unblock the simple reverse part — let me know. Or if there's a mechanism I missed that does the activity-gating without a new MagicInst, I'd appreciate the pointer and I'll refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants