Skip to content

Restore auto-diff link gating to the final codegen link#11779

Draft
jvepsalainen-nv wants to merge 1 commit into
shader-slang:masterfrom
jvepsalainen-nv:fix-9808-link-perf-regression
Draft

Restore auto-diff link gating to the final codegen link#11779
jvepsalainen-nv wants to merge 1 commit into
shader-slang:masterfrom
jvepsalainen-nv:fix-9808-link-perf-regression

Conversation

@jvepsalainen-nv

@jvepsalainen-nv jvepsalainen-nv commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Fixes #11781

Motivation

PR #9808 (auto-diff overhaul) caused a broad compile-time perf regression
between releases 2026.5 and 2026.7, visible in the compile-perf suite
(tools/compile-perf). The "per-compile floor" workloads regressed sharply:
e.g. the minimal (empty shader) linkIR rose ~10-13x and linkAndOptimizeIR
rose similarly, and every compile pays this regardless of whether it uses
auto-diff.

Root cause

#9808 removed the useAutodiff gating that the IR linker previously used to
avoid pulling auto-diff artifacts into programs that do not differentiate:

  • shouldDeepCloneWitnessTable used to return useAutodiff for the
    IDifferentiable family; Refactor auto-diff implementation. #9808 made it return true unconditionally and
    also added the new IForwardDifferentiable / IBackwardDifferentiable /
    IBwdCallable interfaces to the always-deep-clone set. As a result every
    program deep-clones the differentiable-interface witness tables and all
    their entries (Differential, dzero, dadd, fwd/bwd methods).

  • The new cloneAnnotations step clones module-scope IRAnnotations for
    every cloned inst. Every AnnotationKind is differentiability-related
    (DifferentialType/Zero/Add/PairType, Forward/BackwardDerivative, ...), so
    this links a differentiable builtin's derivative associations into programs
    that never use them.

This dead auto-diff IR is then carried through specialize / simplifyIR / DCE
before finally being eliminated, inflating link and optimization time.

Fix

Restore the useAutodiff gating, but apply it ONLY to the final per-target
code-generation link (linkIR). A new IRSharedSpecContext::isFinalCodegenLink
flag is set true only there. The flag is required because the same clone paths
run during prelinkIR and module precompilation, whose output module must stay
complete and self-consistent (it may be serialized -- e.g. the core module).
Gating those paths corrupts the serialized core module and breaks auto-diff
code generation; gating only the throw-away per-target link is safe because
unused symbols are dropped on demand / by later DCE.

  • shouldDeepCloneWitnessTable: the differentiable-interface decision is now
    made BEFORE the generic [HLSLExport]/[KeepAlive] rule. Refactor auto-diff implementation. #9808 marks these
    witness tables [HLSLExport] (for cross-module auto-diff), which previously
    forced a deep clone regardless of gating. For the final codegen link of a
    non-differentiating program we now defer the entries and clone only those
    actually referenced.

  • cloneAnnotations: skip cloning annotations for the final codegen link when
    auto-diff is not in use.

Both gates fall back to the original (always-clone) behavior during prelink /
precompilation and whenever the program uses auto-diff, so auto-diff semantics
are unchanged.

Validation

  • minimal linkAndOptimizeIR: 0.79ms vs the pre-Refactor auto-diff implementation. #9808 parent's 1.87ms
    (the regressed value was ~15ms); linkIR back to ~0.1ms.
  • Auto-diff correctness: the interpreter (slangi) auto-diff tests under
    tests/autodiff/ produce correct numeric derivatives, and auto-diff HLSL /
    SPIR-V code generation no longer crashes, with the core module regenerated
    by the patched compiler.

Scope

This addresses the per-compile-floor / linkIR regression class. A separate,
larger codegen-side regression remains in simplifyIR for shaders that use
differentiable numeric builtins (sin/sqrt/...): #9808 wove differentiability
into the numeric type hierarchy (interface IFloat : IArithmetic, IDifferentiable,
plus the new differentiable interfaces), so a concrete non-auto-diff shader
transitively links float's entire differentiable-trait conformance closure.
That is tracked separately.


The remaining simplifyIR codegen-side regression described under Scope is tracked in issue #11780. Related broad perf-regression tracking: #11474.

…ion from shader-slang#9808)

Motivation
----------
PR shader-slang#9808 (auto-diff overhaul) caused a broad compile-time perf regression
between releases 2026.5 and 2026.7, visible in the compile-perf suite
(tools/compile-perf). The "per-compile floor" workloads regressed sharply:
e.g. the `minimal` (empty shader) `linkIR` rose ~10-13x and `linkAndOptimizeIR`
rose similarly, and every compile pays this regardless of whether it uses
auto-diff.

Root cause
----------
shader-slang#9808 removed the `useAutodiff` gating that the IR linker previously used to
avoid pulling auto-diff artifacts into programs that do not differentiate:

  * `shouldDeepCloneWitnessTable` used to return `useAutodiff` for the
    `IDifferentiable` family; shader-slang#9808 made it return `true` unconditionally and
    also added the new `IForwardDifferentiable` / `IBackwardDifferentiable` /
    `IBwdCallable` interfaces to the always-deep-clone set. As a result every
    program deep-clones the differentiable-interface witness tables and all
    their entries (`Differential`, `dzero`, `dadd`, fwd/bwd methods).

  * The new `cloneAnnotations` step clones module-scope `IRAnnotation`s for
    every cloned inst. Every `AnnotationKind` is differentiability-related
    (DifferentialType/Zero/Add/PairType, Forward/BackwardDerivative, ...), so
    this links a differentiable builtin's derivative associations into programs
    that never use them.

This dead auto-diff IR is then carried through specialize / simplifyIR / DCE
before finally being eliminated, inflating link and optimization time.

Fix
---
Restore the `useAutodiff` gating, but apply it ONLY to the final per-target
code-generation link (`linkIR`). A new `IRSharedSpecContext::isFinalCodegenLink`
flag is set true only there. The flag is required because the same clone paths
run during `prelinkIR` and module precompilation, whose output module must stay
complete and self-consistent (it may be serialized -- e.g. the core module).
Gating those paths corrupts the serialized core module and breaks auto-diff
code generation; gating only the throw-away per-target link is safe because
unused symbols are dropped on demand / by later DCE.

  * `shouldDeepCloneWitnessTable`: the differentiable-interface decision is now
    made BEFORE the generic `[HLSLExport]`/`[KeepAlive]` rule. shader-slang#9808 marks these
    witness tables `[HLSLExport]` (for cross-module auto-diff), which previously
    forced a deep clone regardless of gating. For the final codegen link of a
    non-differentiating program we now defer the entries and clone only those
    actually referenced.

  * `cloneAnnotations`: skip cloning annotations for the final codegen link when
    auto-diff is not in use.

Both gates fall back to the original (always-clone) behavior during prelink /
precompilation and whenever the program uses auto-diff, so auto-diff semantics
are unchanged.

Validation
----------
  * minimal `linkAndOptimizeIR`: 0.79ms vs the pre-shader-slang#9808 parent's 1.87ms
    (the regressed value was ~15ms); `linkIR` back to ~0.1ms.
  * Auto-diff correctness: the interpreter (slangi) auto-diff tests under
    tests/autodiff/ produce correct numeric derivatives, and auto-diff HLSL /
    SPIR-V code generation no longer crashes, with the core module regenerated
    by the patched compiler.

Scope
-----
This addresses the per-compile-floor / `linkIR` regression class. A separate,
larger codegen-side regression remains in `simplifyIR` for shaders that use
differentiable numeric builtins (`sin`/`sqrt`/...): shader-slang#9808 wove differentiability
into the numeric type hierarchy (`interface IFloat : IArithmetic, IDifferentiable`,
plus the new differentiable interfaces), so a concrete non-auto-diff shader
transitively links `float`'s entire differentiable-trait conformance closure.
That is tracked separately.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The linker now marks the final code-generation link in shared context and uses that flag to skip module-scope auto-diff annotation cloning and adjust witness-table deep cloning when autodiff is disabled.

Changes

Final codegen linking

Layer / File(s) Summary
Flag plumbing
source/slang/slang-ir-link.cpp
IRSharedSpecContext stores isFinalCodegenLink, and linkIR sets it on the final code-generation link path.
Annotation cloning gate
source/slang/slang-ir-link.cpp
cloneAnnotations skips module-scope IRAnnotation cloning when final codegen linking runs without autodiff.
Witness-table cloning gate
source/slang/slang-ir-link.cpp
shouldDeepCloneWitnessTable changes differentiability-related deep-clone decisions to depend on the final-link and autodiff state, while kIROp_ComInterfaceDecoration still triggers deep cloning.

Suggested reviewers

  • bmillsNV
  • jkwak-work
  • kaizhangNV
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly matches the main change: restoring auto-diff gating specifically for the final codegen link.
Description check ✅ Passed The description clearly matches the linker gating changes and performance fix described in the patch.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d20ea141-a9d7-426b-87fd-702d97c73572

📥 Commits

Reviewing files that changed from the base of the PR and between b1bdd88 and 1d6c26a.

📒 Files selected for processing (1)
  • source/slang/slang-ir-link.cpp

Comment on lines +733 to +739
case KnownBuiltinDeclName::IDifferentiable:
case KnownBuiltinDeclName::IDifferentiablePtr:
case KnownBuiltinDeclName::IForwardDifferentiable:
case KnownBuiltinDeclName::IBackwardDifferentiable:
case KnownBuiltinDeclName::IBwdCallable:
return !context->getShared()->isFinalCodegenLink ||
context->getShared()->useAutodiff;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find autodiff-related builtin names and decoration sites that should be represented
# in shouldDeepCloneWitnessTable's KnownBuiltinDeclName switch.
rg -n -C2 'KnownBuiltinDeclName::(I.*Differentiable|I.*Callable|.*Differential|.*Diff)' source/slang prelude source/standard-modules

# Find where COM decorations and known-builtin decorations are attached so overlap can be ruled out.
rg -n -C3 '(kIROp_ComInterfaceDecoration|addComInterfaceDecoration|IRComInterfaceDecoration|IRKnownBuiltinDecoration|addKnownBuiltinDecoration)' source/slang prelude source/standard-modules

# Find export/keepalive attachment for witness tables to confirm all exported autodiff witness
# conformance types are covered by the local switch.
rg -n -C3 '(HLSLExportDecoration|addHLSLExportDecoration|KeepAliveDecoration|addKeepAliveDecoration)' source/slang prelude source/standard-modules

Repository: shader-slang/slang

Length of output: 50374


🏁 Script executed:

#!/bin/bash
# List all KnownBuiltinDeclName enum values to check for completeness
rg -A 1 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Check the exact logic flow in shouldDeepCloneWitnessTable to see if NullDifferential
# handling at lines 2337-2339 is separate from the function being reviewed
sed -n '720,770p' source/slang/slang-ir-link.cpp

Repository: shader-slang/slang

Length of output: 1927


🏁 Script executed:

#!/bin/bash
# Find the KnownBuiltinDeclName enum definition
find source/slang -name "*.h" -exec rg -l "enum KnownBuiltinDeclName" {} \; | head -5

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Once we find the file, list all enum values
rg -B 5 -A 80 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Search for KnownBuiltinDeclName definition more broadly
rg "KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -20

Repository: shader-slang/slang

Length of output: 401


🏁 Script executed:

#!/bin/bash
# Search in all .h files for the enum
rg -n "enum.*KnownBuiltinDeclName|KnownBuiltinDeclName.*{" source/slang/*.h | head -30

Repository: shader-slang/slang

Length of output: 320


🏁 Script executed:

#!/bin/bash
# Check where KnownBuiltinDeclName is used to infer its definition
rg -n "struct IRKnownBuiltinDecoration" source/slang/slang-ir-insts.h -A 20

Repository: shader-slang/slang

Length of output: 1044


🏁 Script executed:

#!/bin/bash
# Search for KnownBuiltinDeclName definition across all files
rg -n "enum class KnownBuiltinDeclName|typedef.*KnownBuiltinDeclName|KnownBuiltinDeclName =" source/ | head -20

Repository: shader-slang/slang

Length of output: 246


🏁 Script executed:

#!/bin/bash
# Check slang-ir-util.cpp where the string conversion happens - it likely has the full enum list
sed -n '2100,2200p' source/slang/slang-ir-util.cpp

Repository: shader-slang/slang

Length of output: 3705


🏁 Script executed:

#!/bin/bash
# Read the full enum definition from slang-ast-support-types.h
sed -n '220,280p' source/slang/slang-ast-support-types.h

Repository: shader-slang/slang

Length of output: 1932


Verify the autodiff conformance list: NullDifferential is intentionally excluded and handled separately.

The switch at lines 733-739 intentionally covers only autodiff interface conformance types (IDifferentiable, IDifferentiablePtr, IForwardDifferentiable, IBackwardDifferentiable, IBwdCallable). NullDifferential—a concrete struct type, not an interface—is deliberately excluded from this switch and handled in a separate code path at lines 2337-2339, where it is unconditionally cloned with cloneAndKeepAlive. This distinction is correct: interface witness tables need linkage-mode-dependent cloning, while NullDifferential always requires cloning regardless of link mode.

COM interface decorations (kIROp_ComInterfaceDecoration) and autodiff KnownBuiltinDecoration are mutually exclusive based on their attachment sites (COM on user/interface types, KnownBuiltin on stdlib meta types), so the ordering in shouldDeepCloneWitnessTable does not create a gap.

Sources: Coding guidelines, Path instructions

@jvepsalainen-nv jvepsalainen-nv self-assigned this Jun 26, 2026
@jvepsalainen-nv jvepsalainen-nv changed the title Restore auto-diff link gating to the final codegen link (perf regression from #9808) Restore auto-diff link gating to the final codegen link Jun 26, 2026
@jvepsalainen-nv jvepsalainen-nv marked this pull request as draft June 26, 2026 11:50

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: 🔵 1 question, 1 gap — no correctness bug found, but two subtle issues worth resolving before merge.

The PR restores useAutodiff gating that PR #9808 removed, applied only during the final per-target linkIR via a new IRSharedSpecContext::isFinalCodegenLink flag. IR-correctness trace confirms the optimization is sound today: IDifferentiable requirements use IRBuiltinRequirementKey, so witness-table entries are still cloned eagerly via the isBuiltinReqEntry path even when shouldDeepCloneWitnessTable returns false, and all IRAnnotation consumers outside the auto-diff passes are themselves gated by useAutodiff. The two findings are (1) an intentional reorder that subordinates [HLSLExport]/[KeepAlive] to the new gate — benign today but fragile to future spec changes — and (2) the absence of any regression test that would catch a re-introduction of the original perf regression.

Changes Overview

Auto-diff link-time gating (source/slang/slang-ir-link.cpp)

  • Adds IRSharedSpecContext::isFinalCodegenLink, set to true only inside linkIR(CodeGenContext*) (not in prelinkIR / module precompilation, whose output must stay serializable as the core module).
  • cloneAnnotations early-returns when isFinalCodegenLink && !useAutodiff. Author claims (and reviewers verified at slang-type-system-shared.h) that every AnnotationKind is auto-diff-related.
  • shouldDeepCloneWitnessTable is reorganized: the IDifferentiable / IDifferentiablePtr / IForwardDifferentiable / IBackwardDifferentiable / IBwdCallable check now runs first (before [HLSLExport] / [KeepAlive] on the table) and returns !isFinalCodegenLink || useAutodiff instead of unconditional true. The old branch in the second decoration loop is removed.
Findings (2 total)
Severity Location Finding
🔵 Question source/slang/slang-ir-link.cpp:738 Reorder subordinates explicit [HLSLExport]/[KeepAlive] to the auto-diff gate for differentiable-interface witness tables — intentional?
🟡 Gap source/slang/slang-ir-link.cpp:229 No regression test for the new gating behavior (PR adds 0 test files); future refactors could silently re-introduce the #9808 regression.

case KnownBuiltinDeclName::IForwardDifferentiable:
case KnownBuiltinDeclName::IBackwardDifferentiable:
case KnownBuiltinDeclName::IBwdCallable:
return !context->getShared()->isFinalCodegenLink ||

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Question: Is the priority inversion of [HLSLExport] / [KeepAlive] over the differentiable-interface gate intentional?

The new differentiable-interface check runs before the table-decoration scan at table->getDecorations() (which still returns true unconditionally for kIROp_HLSLExportDecoration and kIROp_KeepAliveDecoration). For a witness table that is both on a differentiable interface and carries an explicit [HLSLExport] or [KeepAlive] decoration, the new ordering returns !isFinalCodegenLink || useAutodiff and never reaches the explicit keep-alive directive — so during a non-auto-diff final codegen link the explicit user/compiler-supplied "must preserve" directive is silently overridden.

This appears benign today: IDifferentiable family requirements are all keyed by IRBuiltinRequirementKey, so the witness-table entries are still eagerly cloned via the isBuiltinReqEntry branch in cloneWitnessTable even when shouldDeepCloneWitnessTable returns false. But:

  1. The PR description (Proposed solution, second bullet) explicitly says "the differentiable-interface decision is now made BEFORE the generic [HLSLExport]/[KeepAlive] rule" — so the reorder is intentional, but the description doesn't explain why an explicit [KeepAlive] should be subordinate to the auto-diff gate.
  2. If a future IDifferentiable requirement is added that is not keyed by IRBuiltinRequirementKey (e.g. a user-overridable default-method requirement), the deferred-entry path would diverge from the deep-clone path, and the explicit [HLSLExport]/[KeepAlive] semantics would be lost on the witness table for a non-auto-diff link.

Question: Either (a) move the [HLSLExport]/[KeepAlive] table-decoration scan above the differentiable-interface check so explicit keep-alive directives win even on differentiable-interface witnesses, or (b) document in the new comment why subordinating [KeepAlive] to the gate is correct (the "all requirements are built-in keys" invariant) and SLANG_ASSERT that invariant at the gate site so a future spec change cannot silently regress it.

// not in use; this restores the pre-PR-#9808 behavior where derivative info
// was never linked into non-differentiating modules. We must not skip them
// during prelink / precompilation, whose output module stays live (and may be
// serialized, e.g. the core module) and must keep its annotations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: No regression test locks in the new gating behavior.

The PR's correctness claim is that, during the final per-target codegen link of a non-auto-diff program, it is safe to (a) skip cloneAnnotations entirely and (b) skip deep-cloning differentiable-interface witness tables. Both are new silent behavior changes in IR link / specialization passes. No .slang test was added, so:

  • A future refactor of doesModuleUseAutodiff, of the set of KnownBuiltinDeclName values gated here, or of the wiring that flips isFinalCodegenLink on linkIR vs. prelinkIR / module precompilation can silently re-introduce the original PR Refactor auto-diff implementation. #9808 regression with zero CI failure.
  • The existing tests/autodiff/ suite covers the positive (auto-diff in use) path, so it catches "we wrongly skipped something auto-diff needed." It does not catch "we wrongly clone auto-diff artifacts into non-auto-diff programs" or "we wrongly skip an [HLSLExport]/[KeepAlive] witness table."

Suggestion: Add a CPU/slangi regression test under tests/autodiff/ (or tests/compile/) that:

  1. Compiles a non-auto-diff shader calling a builtin with an IDifferentiable witness (sin, cos, sqrt), with //TEST:SIMPLE(filecheck=CHECK):-target hlsl -dump-ir and CHECK-NOT: against IDifferentiable_Witness / __fwd_diff / annotation-id markers — locking in the optimization.
  2. Pair with a positive case using fwd_diff(sin) + CHECK: for the witness, locking in that the gate does not prune when auto-diff is in use.
  3. Optionally a test that defines [HLSLExport] struct Foo : IDifferentiable and verifies the witness survives a non-auto-diff link (locks the reorder behavior — see related comment on line 738).

The PR description says #11780 tracks a separate simplifyIR regression — that's distinct from a regression test for this PR's correctness claim, which is still missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-compile linkIR / linkAndOptimizeIR regression since the auto-diff refactor (auto-diff artifacts linked into every program)

2 participants