Restore auto-diff link gating to the final codegen link#11779
Restore auto-diff link gating to the final codegen link#11779jvepsalainen-nv wants to merge 1 commit into
Conversation
…ion from shader-slang#9808) Motivation ---------- PR shader-slang#9808 (auto-diff overhaul) caused a broad compile-time perf regression between releases 2026.5 and 2026.7, visible in the compile-perf suite (tools/compile-perf). The "per-compile floor" workloads regressed sharply: e.g. the `minimal` (empty shader) `linkIR` rose ~10-13x and `linkAndOptimizeIR` rose similarly, and every compile pays this regardless of whether it uses auto-diff. Root cause ---------- shader-slang#9808 removed the `useAutodiff` gating that the IR linker previously used to avoid pulling auto-diff artifacts into programs that do not differentiate: * `shouldDeepCloneWitnessTable` used to return `useAutodiff` for the `IDifferentiable` family; shader-slang#9808 made it return `true` unconditionally and also added the new `IForwardDifferentiable` / `IBackwardDifferentiable` / `IBwdCallable` interfaces to the always-deep-clone set. As a result every program deep-clones the differentiable-interface witness tables and all their entries (`Differential`, `dzero`, `dadd`, fwd/bwd methods). * The new `cloneAnnotations` step clones module-scope `IRAnnotation`s for every cloned inst. Every `AnnotationKind` is differentiability-related (DifferentialType/Zero/Add/PairType, Forward/BackwardDerivative, ...), so this links a differentiable builtin's derivative associations into programs that never use them. This dead auto-diff IR is then carried through specialize / simplifyIR / DCE before finally being eliminated, inflating link and optimization time. Fix --- Restore the `useAutodiff` gating, but apply it ONLY to the final per-target code-generation link (`linkIR`). A new `IRSharedSpecContext::isFinalCodegenLink` flag is set true only there. The flag is required because the same clone paths run during `prelinkIR` and module precompilation, whose output module must stay complete and self-consistent (it may be serialized -- e.g. the core module). Gating those paths corrupts the serialized core module and breaks auto-diff code generation; gating only the throw-away per-target link is safe because unused symbols are dropped on demand / by later DCE. * `shouldDeepCloneWitnessTable`: the differentiable-interface decision is now made BEFORE the generic `[HLSLExport]`/`[KeepAlive]` rule. shader-slang#9808 marks these witness tables `[HLSLExport]` (for cross-module auto-diff), which previously forced a deep clone regardless of gating. For the final codegen link of a non-differentiating program we now defer the entries and clone only those actually referenced. * `cloneAnnotations`: skip cloning annotations for the final codegen link when auto-diff is not in use. Both gates fall back to the original (always-clone) behavior during prelink / precompilation and whenever the program uses auto-diff, so auto-diff semantics are unchanged. Validation ---------- * minimal `linkAndOptimizeIR`: 0.79ms vs the pre-shader-slang#9808 parent's 1.87ms (the regressed value was ~15ms); `linkIR` back to ~0.1ms. * Auto-diff correctness: the interpreter (slangi) auto-diff tests under tests/autodiff/ produce correct numeric derivatives, and auto-diff HLSL / SPIR-V code generation no longer crashes, with the core module regenerated by the patched compiler. Scope ----- This addresses the per-compile-floor / `linkIR` regression class. A separate, larger codegen-side regression remains in `simplifyIR` for shaders that use differentiable numeric builtins (`sin`/`sqrt`/...): shader-slang#9808 wove differentiability into the numeric type hierarchy (`interface IFloat : IArithmetic, IDifferentiable`, plus the new differentiable interfaces), so a concrete non-auto-diff shader transitively links `float`'s entire differentiable-trait conformance closure. That is tracked separately.
📝 WalkthroughWalkthroughThe linker now marks the final code-generation link in shared context and uses that flag to skip module-scope auto-diff annotation cloning and adjust witness-table deep cloning when autodiff is disabled. ChangesFinal codegen linking
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d20ea141-a9d7-426b-87fd-702d97c73572
📒 Files selected for processing (1)
source/slang/slang-ir-link.cpp
| case KnownBuiltinDeclName::IDifferentiable: | ||
| case KnownBuiltinDeclName::IDifferentiablePtr: | ||
| case KnownBuiltinDeclName::IForwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBackwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBwdCallable: | ||
| return !context->getShared()->isFinalCodegenLink || | ||
| context->getShared()->useAutodiff; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find autodiff-related builtin names and decoration sites that should be represented
# in shouldDeepCloneWitnessTable's KnownBuiltinDeclName switch.
rg -n -C2 'KnownBuiltinDeclName::(I.*Differentiable|I.*Callable|.*Differential|.*Diff)' source/slang prelude source/standard-modules
# Find where COM decorations and known-builtin decorations are attached so overlap can be ruled out.
rg -n -C3 '(kIROp_ComInterfaceDecoration|addComInterfaceDecoration|IRComInterfaceDecoration|IRKnownBuiltinDecoration|addKnownBuiltinDecoration)' source/slang prelude source/standard-modules
# Find export/keepalive attachment for witness tables to confirm all exported autodiff witness
# conformance types are covered by the local switch.
rg -n -C3 '(HLSLExportDecoration|addHLSLExportDecoration|KeepAliveDecoration|addKeepAliveDecoration)' source/slang prelude source/standard-modulesRepository: shader-slang/slang
Length of output: 50374
🏁 Script executed:
#!/bin/bash
# List all KnownBuiltinDeclName enum values to check for completeness
rg -A 1 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100Repository: shader-slang/slang
Length of output: 156
🏁 Script executed:
#!/bin/bash
# Check the exact logic flow in shouldDeepCloneWitnessTable to see if NullDifferential
# handling at lines 2337-2339 is separate from the function being reviewed
sed -n '720,770p' source/slang/slang-ir-link.cppRepository: shader-slang/slang
Length of output: 1927
🏁 Script executed:
#!/bin/bash
# Find the KnownBuiltinDeclName enum definition
find source/slang -name "*.h" -exec rg -l "enum KnownBuiltinDeclName" {} \; | head -5Repository: shader-slang/slang
Length of output: 156
🏁 Script executed:
#!/bin/bash
# Once we find the file, list all enum values
rg -B 5 -A 80 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100Repository: shader-slang/slang
Length of output: 156
🏁 Script executed:
#!/bin/bash
# Search for KnownBuiltinDeclName definition more broadly
rg "KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -20Repository: shader-slang/slang
Length of output: 401
🏁 Script executed:
#!/bin/bash
# Search in all .h files for the enum
rg -n "enum.*KnownBuiltinDeclName|KnownBuiltinDeclName.*{" source/slang/*.h | head -30Repository: shader-slang/slang
Length of output: 320
🏁 Script executed:
#!/bin/bash
# Check where KnownBuiltinDeclName is used to infer its definition
rg -n "struct IRKnownBuiltinDecoration" source/slang/slang-ir-insts.h -A 20Repository: shader-slang/slang
Length of output: 1044
🏁 Script executed:
#!/bin/bash
# Search for KnownBuiltinDeclName definition across all files
rg -n "enum class KnownBuiltinDeclName|typedef.*KnownBuiltinDeclName|KnownBuiltinDeclName =" source/ | head -20Repository: shader-slang/slang
Length of output: 246
🏁 Script executed:
#!/bin/bash
# Check slang-ir-util.cpp where the string conversion happens - it likely has the full enum list
sed -n '2100,2200p' source/slang/slang-ir-util.cppRepository: shader-slang/slang
Length of output: 3705
🏁 Script executed:
#!/bin/bash
# Read the full enum definition from slang-ast-support-types.h
sed -n '220,280p' source/slang/slang-ast-support-types.hRepository: shader-slang/slang
Length of output: 1932
Verify the autodiff conformance list: NullDifferential is intentionally excluded and handled separately.
The switch at lines 733-739 intentionally covers only autodiff interface conformance types (IDifferentiable, IDifferentiablePtr, IForwardDifferentiable, IBackwardDifferentiable, IBwdCallable). NullDifferential—a concrete struct type, not an interface—is deliberately excluded from this switch and handled in a separate code path at lines 2337-2339, where it is unconditionally cloned with cloneAndKeepAlive. This distinction is correct: interface witness tables need linkage-mode-dependent cloning, while NullDifferential always requires cloning regardless of link mode.
COM interface decorations (kIROp_ComInterfaceDecoration) and autodiff KnownBuiltinDecoration are mutually exclusive based on their attachment sites (COM on user/interface types, KnownBuiltin on stdlib meta types), so the ordering in shouldDeepCloneWitnessTable does not create a gap.
Sources: Coding guidelines, Path instructions
There was a problem hiding this comment.
Verdict: 🔵 1 question, 1 gap — no correctness bug found, but two subtle issues worth resolving before merge.
The PR restores useAutodiff gating that PR #9808 removed, applied only during the final per-target linkIR via a new IRSharedSpecContext::isFinalCodegenLink flag. IR-correctness trace confirms the optimization is sound today: IDifferentiable requirements use IRBuiltinRequirementKey, so witness-table entries are still cloned eagerly via the isBuiltinReqEntry path even when shouldDeepCloneWitnessTable returns false, and all IRAnnotation consumers outside the auto-diff passes are themselves gated by useAutodiff. The two findings are (1) an intentional reorder that subordinates [HLSLExport]/[KeepAlive] to the new gate — benign today but fragile to future spec changes — and (2) the absence of any regression test that would catch a re-introduction of the original perf regression.
Changes Overview
Auto-diff link-time gating (source/slang/slang-ir-link.cpp)
- Adds
IRSharedSpecContext::isFinalCodegenLink, set totrueonly insidelinkIR(CodeGenContext*)(not inprelinkIR/ module precompilation, whose output must stay serializable as the core module). cloneAnnotationsearly-returns whenisFinalCodegenLink && !useAutodiff. Author claims (and reviewers verified atslang-type-system-shared.h) that everyAnnotationKindis auto-diff-related.shouldDeepCloneWitnessTableis reorganized: theIDifferentiable/IDifferentiablePtr/IForwardDifferentiable/IBackwardDifferentiable/IBwdCallablecheck now runs first (before[HLSLExport]/[KeepAlive]on the table) and returns!isFinalCodegenLink || useAutodiffinstead of unconditionaltrue. The old branch in the second decoration loop is removed.
Findings (2 total)
| Severity | Location | Finding |
|---|---|---|
| 🔵 Question | source/slang/slang-ir-link.cpp:738 |
Reorder subordinates explicit [HLSLExport]/[KeepAlive] to the auto-diff gate for differentiable-interface witness tables — intentional? |
| 🟡 Gap | source/slang/slang-ir-link.cpp:229 |
No regression test for the new gating behavior (PR adds 0 test files); future refactors could silently re-introduce the #9808 regression. |
| case KnownBuiltinDeclName::IForwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBackwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBwdCallable: | ||
| return !context->getShared()->isFinalCodegenLink || |
There was a problem hiding this comment.
🔵 Question: Is the priority inversion of [HLSLExport] / [KeepAlive] over the differentiable-interface gate intentional?
The new differentiable-interface check runs before the table-decoration scan at table->getDecorations() (which still returns true unconditionally for kIROp_HLSLExportDecoration and kIROp_KeepAliveDecoration). For a witness table that is both on a differentiable interface and carries an explicit [HLSLExport] or [KeepAlive] decoration, the new ordering returns !isFinalCodegenLink || useAutodiff and never reaches the explicit keep-alive directive — so during a non-auto-diff final codegen link the explicit user/compiler-supplied "must preserve" directive is silently overridden.
This appears benign today: IDifferentiable family requirements are all keyed by IRBuiltinRequirementKey, so the witness-table entries are still eagerly cloned via the isBuiltinReqEntry branch in cloneWitnessTable even when shouldDeepCloneWitnessTable returns false. But:
- The PR description (Proposed solution, second bullet) explicitly says "the differentiable-interface decision is now made BEFORE the generic
[HLSLExport]/[KeepAlive]rule" — so the reorder is intentional, but the description doesn't explain why an explicit[KeepAlive]should be subordinate to the auto-diff gate. - If a future
IDifferentiablerequirement is added that is not keyed byIRBuiltinRequirementKey(e.g. a user-overridable default-method requirement), the deferred-entry path would diverge from the deep-clone path, and the explicit[HLSLExport]/[KeepAlive]semantics would be lost on the witness table for a non-auto-diff link.
Question: Either (a) move the [HLSLExport]/[KeepAlive] table-decoration scan above the differentiable-interface check so explicit keep-alive directives win even on differentiable-interface witnesses, or (b) document in the new comment why subordinating [KeepAlive] to the gate is correct (the "all requirements are built-in keys" invariant) and SLANG_ASSERT that invariant at the gate site so a future spec change cannot silently regress it.
| // not in use; this restores the pre-PR-#9808 behavior where derivative info | ||
| // was never linked into non-differentiating modules. We must not skip them | ||
| // during prelink / precompilation, whose output module stays live (and may be | ||
| // serialized, e.g. the core module) and must keep its annotations. |
There was a problem hiding this comment.
🟡 Gap: No regression test locks in the new gating behavior.
The PR's correctness claim is that, during the final per-target codegen link of a non-auto-diff program, it is safe to (a) skip cloneAnnotations entirely and (b) skip deep-cloning differentiable-interface witness tables. Both are new silent behavior changes in IR link / specialization passes. No .slang test was added, so:
- A future refactor of
doesModuleUseAutodiff, of the set ofKnownBuiltinDeclNamevalues gated here, or of the wiring that flipsisFinalCodegenLinkonlinkIRvs.prelinkIR/ module precompilation can silently re-introduce the original PR Refactor auto-diff implementation. #9808 regression with zero CI failure. - The existing
tests/autodiff/suite covers the positive (auto-diff in use) path, so it catches "we wrongly skipped something auto-diff needed." It does not catch "we wrongly clone auto-diff artifacts into non-auto-diff programs" or "we wrongly skip an[HLSLExport]/[KeepAlive]witness table."
Suggestion: Add a CPU/slangi regression test under tests/autodiff/ (or tests/compile/) that:
- Compiles a non-auto-diff shader calling a builtin with an
IDifferentiablewitness (sin,cos,sqrt), with//TEST:SIMPLE(filecheck=CHECK):-target hlsl -dump-irandCHECK-NOT:againstIDifferentiable_Witness/__fwd_diff/ annotation-id markers — locking in the optimization. - Pair with a positive case using
fwd_diff(sin)+CHECK:for the witness, locking in that the gate does not prune when auto-diff is in use. - Optionally a test that defines
[HLSLExport] struct Foo : IDifferentiableand verifies the witness survives a non-auto-diff link (locks the reorder behavior — see related comment on line 738).
The PR description says #11780 tracks a separate simplifyIR regression — that's distinct from a regression test for this PR's correctness claim, which is still missing.
Fixes #11781
Motivation
PR #9808 (auto-diff overhaul) caused a broad compile-time perf regression
between releases 2026.5 and 2026.7, visible in the compile-perf suite
(tools/compile-perf). The "per-compile floor" workloads regressed sharply:
e.g. the
minimal(empty shader)linkIRrose ~10-13x andlinkAndOptimizeIRrose similarly, and every compile pays this regardless of whether it uses
auto-diff.
Root cause
#9808 removed the
useAutodiffgating that the IR linker previously used toavoid pulling auto-diff artifacts into programs that do not differentiate:
shouldDeepCloneWitnessTableused to returnuseAutodifffor theIDifferentiablefamily; Refactor auto-diff implementation. #9808 made it returntrueunconditionally andalso added the new
IForwardDifferentiable/IBackwardDifferentiable/IBwdCallableinterfaces to the always-deep-clone set. As a result everyprogram deep-clones the differentiable-interface witness tables and all
their entries (
Differential,dzero,dadd, fwd/bwd methods).The new
cloneAnnotationsstep clones module-scopeIRAnnotations forevery cloned inst. Every
AnnotationKindis differentiability-related(DifferentialType/Zero/Add/PairType, Forward/BackwardDerivative, ...), so
this links a differentiable builtin's derivative associations into programs
that never use them.
This dead auto-diff IR is then carried through specialize / simplifyIR / DCE
before finally being eliminated, inflating link and optimization time.
Fix
Restore the
useAutodiffgating, but apply it ONLY to the final per-targetcode-generation link (
linkIR). A newIRSharedSpecContext::isFinalCodegenLinkflag is set true only there. The flag is required because the same clone paths
run during
prelinkIRand module precompilation, whose output module must staycomplete and self-consistent (it may be serialized -- e.g. the core module).
Gating those paths corrupts the serialized core module and breaks auto-diff
code generation; gating only the throw-away per-target link is safe because
unused symbols are dropped on demand / by later DCE.
shouldDeepCloneWitnessTable: the differentiable-interface decision is nowmade BEFORE the generic
[HLSLExport]/[KeepAlive]rule. Refactor auto-diff implementation. #9808 marks thesewitness tables
[HLSLExport](for cross-module auto-diff), which previouslyforced a deep clone regardless of gating. For the final codegen link of a
non-differentiating program we now defer the entries and clone only those
actually referenced.
cloneAnnotations: skip cloning annotations for the final codegen link whenauto-diff is not in use.
Both gates fall back to the original (always-clone) behavior during prelink /
precompilation and whenever the program uses auto-diff, so auto-diff semantics
are unchanged.
Validation
linkAndOptimizeIR: 0.79ms vs the pre-Refactor auto-diff implementation. #9808 parent's 1.87ms(the regressed value was ~15ms);
linkIRback to ~0.1ms.tests/autodiff/ produce correct numeric derivatives, and auto-diff HLSL /
SPIR-V code generation no longer crashes, with the core module regenerated
by the patched compiler.
Scope
This addresses the per-compile-floor /
linkIRregression class. A separate,larger codegen-side regression remains in
simplifyIRfor shaders that usedifferentiable numeric builtins (
sin/sqrt/...): #9808 wove differentiabilityinto the numeric type hierarchy (
interface IFloat : IArithmetic, IDifferentiable,plus the new differentiable interfaces), so a concrete non-auto-diff shader
transitively links
float's entire differentiable-trait conformance closure.That is tracked separately.
The remaining
simplifyIRcodegen-side regression described under Scope is tracked in issue #11780. Related broad perf-regression tracking: #11474.