Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 62 additions & 16 deletions source/slang/slang-ir-link.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ struct IRSharedSpecContext

bool useAutodiff = false;

// True only for the final code-generation link (`linkIR`), where the linked
// module is a throw-away copy built for one target and unused symbols are
// dropped on demand / by later DCE. It is false for `prelinkIR` and module
// precompilation, which mutate a module that must remain complete and
// self-consistent (it may be serialized, e.g. the core module). Auto-diff
// link-time pruning (see `cloneAnnotations` / `shouldDeepCloneWitnessTable`)
// is only safe in the former case.
bool isFinalCodegenLink = false;

IRBuilder builderStorage;

// The "global" specialization environment.
Expand Down Expand Up @@ -206,6 +215,21 @@ static void cloneAnnotations(IRSpecContextBase* context, IRInst* clonedInst, IRI
{
SLANG_UNUSED(clonedInst);

// `IRAnnotation`s exclusively carry auto-diff trait associations: a target's
// derivative functions and differential type/zero/add/pair witnesses (every
// `AnnotationKind` is differentiability-related). A program that does not use
// auto-diff never reads them, so cloning them here only bloats the linked
// module — pulling every differentiable builtin's derivative methods (e.g.
// `sin`/`cos`/`sqrt`) and `float`'s differential machinery through every
// downstream pass (specialize / simplifyIR / DCE) before they are finally
// eliminated as dead. Skip them for the final code-gen link when auto-diff is
// not in use; this restores the pre-PR-#9808 behavior where derivative info
// was never linked into non-differentiating modules. We must not skip them
// during prelink / precompilation, whose output module stays live (and may be
// serialized, e.g. the core module) and must keep its annotations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: No regression test locks in the new gating behavior.

The PR's correctness claim is that, during the final per-target codegen link of a non-auto-diff program, it is safe to (a) skip cloneAnnotations entirely and (b) skip deep-cloning differentiable-interface witness tables. Both are new silent behavior changes in IR link / specialization passes. No .slang test was added, so:

  • A future refactor of doesModuleUseAutodiff, of the set of KnownBuiltinDeclName values gated here, or of the wiring that flips isFinalCodegenLink on linkIR vs. prelinkIR / module precompilation can silently re-introduce the original PR Refactor auto-diff implementation. #9808 regression with zero CI failure.
  • The existing tests/autodiff/ suite covers the positive (auto-diff in use) path, so it catches "we wrongly skipped something auto-diff needed." It does not catch "we wrongly clone auto-diff artifacts into non-auto-diff programs" or "we wrongly skip an [HLSLExport]/[KeepAlive] witness table."

Suggestion: Add a CPU/slangi regression test under tests/autodiff/ (or tests/compile/) that:

  1. Compiles a non-auto-diff shader calling a builtin with an IDifferentiable witness (sin, cos, sqrt), with //TEST:SIMPLE(filecheck=CHECK):-target hlsl -dump-ir and CHECK-NOT: against IDifferentiable_Witness / __fwd_diff / annotation-id markers — locking in the optimization.
  2. Pair with a positive case using fwd_diff(sin) + CHECK: for the witness, locking in that the gate does not prune when auto-diff is in use.
  3. Optionally a test that defines [HLSLExport] struct Foo : IDifferentiable and verifies the witness survives a non-auto-diff link (locks the reorder behavior — see related comment on line 738).

The PR description says #11780 tracks a separate simplifyIR regression — that's distinct from a regression test for this PR's correctness claim, which is still missing.

if (context->getShared()->isFinalCodegenLink && !context->getShared()->useAutodiff)
return;

// Local annotations will be cloned normally as part of cloning their parent function/generic
// body. For module-scope annotations, we need to look them up since they won't get
// automatically pulled in.
Expand Down Expand Up @@ -686,7 +710,39 @@ IRGlobalGenericParam* cloneGlobalGenericParamImpl(

bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* table)
{
SLANG_UNUSED(context);
auto conformanceType = getResolvedInstForDecorations(table->getConformanceType());

// Differentiable-interface witness tables are decided first, *before* the
// generic `HLSLExport` / `KeepAlive` rule below. PR #9808 marks these tables
// (`IDifferentiable`, `IForwardDifferentiable`, ...) and their entries
// `[HLSLExport]` so they can be linked across modules for auto-diff, but that
// export would otherwise force a deep clone into *every* program. A shader
// that references a differentiable builtin (e.g. `sin`) would then drag the
// table's `fwd_diff` / `bwd_diff` / `dadd` / `dzero` entries — and the whole
// derivative-function closure they reference — through every downstream pass,
// even with no auto-diff in use (the PR #9808 codegen regression). For the
// final code-gen link we instead defer the entries and clone only those that
// are actually referenced, unless the program uses auto-diff. Prelink /
// precompilation must keep the output module complete, so it still deep-clones.
for (auto decor : conformanceType->getDecorations())
{
if (auto knownBuiltin = as<IRKnownBuiltinDecoration>(decor))
{
switch (knownBuiltin->getName())
{
case KnownBuiltinDeclName::IDifferentiable:
case KnownBuiltinDeclName::IDifferentiablePtr:
case KnownBuiltinDeclName::IForwardDifferentiable:
case KnownBuiltinDeclName::IBackwardDifferentiable:
case KnownBuiltinDeclName::IBwdCallable:
return !context->getShared()->isFinalCodegenLink ||

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Question: Is the priority inversion of [HLSLExport] / [KeepAlive] over the differentiable-interface gate intentional?

The new differentiable-interface check runs before the table-decoration scan at table->getDecorations() (which still returns true unconditionally for kIROp_HLSLExportDecoration and kIROp_KeepAliveDecoration). For a witness table that is both on a differentiable interface and carries an explicit [HLSLExport] or [KeepAlive] decoration, the new ordering returns !isFinalCodegenLink || useAutodiff and never reaches the explicit keep-alive directive — so during a non-auto-diff final codegen link the explicit user/compiler-supplied "must preserve" directive is silently overridden.

This appears benign today: IDifferentiable family requirements are all keyed by IRBuiltinRequirementKey, so the witness-table entries are still eagerly cloned via the isBuiltinReqEntry branch in cloneWitnessTable even when shouldDeepCloneWitnessTable returns false. But:

  1. The PR description (Proposed solution, second bullet) explicitly says "the differentiable-interface decision is now made BEFORE the generic [HLSLExport]/[KeepAlive] rule" — so the reorder is intentional, but the description doesn't explain why an explicit [KeepAlive] should be subordinate to the auto-diff gate.
  2. If a future IDifferentiable requirement is added that is not keyed by IRBuiltinRequirementKey (e.g. a user-overridable default-method requirement), the deferred-entry path would diverge from the deep-clone path, and the explicit [HLSLExport]/[KeepAlive] semantics would be lost on the witness table for a non-auto-diff link.

Question: Either (a) move the [HLSLExport]/[KeepAlive] table-decoration scan above the differentiable-interface check so explicit keep-alive directives win even on differentiable-interface witnesses, or (b) document in the new comment why subordinating [KeepAlive] to the gate is correct (the "all requirements are built-in keys" invariant) and SLANG_ASSERT that invariant at the gate site so a future spec change cannot silently regress it.

context->getShared()->useAutodiff;
Comment on lines +733 to +739

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find autodiff-related builtin names and decoration sites that should be represented
# in shouldDeepCloneWitnessTable's KnownBuiltinDeclName switch.
rg -n -C2 'KnownBuiltinDeclName::(I.*Differentiable|I.*Callable|.*Differential|.*Diff)' source/slang prelude source/standard-modules

# Find where COM decorations and known-builtin decorations are attached so overlap can be ruled out.
rg -n -C3 '(kIROp_ComInterfaceDecoration|addComInterfaceDecoration|IRComInterfaceDecoration|IRKnownBuiltinDecoration|addKnownBuiltinDecoration)' source/slang prelude source/standard-modules

# Find export/keepalive attachment for witness tables to confirm all exported autodiff witness
# conformance types are covered by the local switch.
rg -n -C3 '(HLSLExportDecoration|addHLSLExportDecoration|KeepAliveDecoration|addKeepAliveDecoration)' source/slang prelude source/standard-modules

Repository: shader-slang/slang

Length of output: 50374


🏁 Script executed:

#!/bin/bash
# List all KnownBuiltinDeclName enum values to check for completeness
rg -A 1 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Check the exact logic flow in shouldDeepCloneWitnessTable to see if NullDifferential
# handling at lines 2337-2339 is separate from the function being reviewed
sed -n '720,770p' source/slang/slang-ir-link.cpp

Repository: shader-slang/slang

Length of output: 1927


🏁 Script executed:

#!/bin/bash
# Find the KnownBuiltinDeclName enum definition
find source/slang -name "*.h" -exec rg -l "enum KnownBuiltinDeclName" {} \; | head -5

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Once we find the file, list all enum values
rg -B 5 -A 80 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100

Repository: shader-slang/slang

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Search for KnownBuiltinDeclName definition more broadly
rg "KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -20

Repository: shader-slang/slang

Length of output: 401


🏁 Script executed:

#!/bin/bash
# Search in all .h files for the enum
rg -n "enum.*KnownBuiltinDeclName|KnownBuiltinDeclName.*{" source/slang/*.h | head -30

Repository: shader-slang/slang

Length of output: 320


🏁 Script executed:

#!/bin/bash
# Check where KnownBuiltinDeclName is used to infer its definition
rg -n "struct IRKnownBuiltinDecoration" source/slang/slang-ir-insts.h -A 20

Repository: shader-slang/slang

Length of output: 1044


🏁 Script executed:

#!/bin/bash
# Search for KnownBuiltinDeclName definition across all files
rg -n "enum class KnownBuiltinDeclName|typedef.*KnownBuiltinDeclName|KnownBuiltinDeclName =" source/ | head -20

Repository: shader-slang/slang

Length of output: 246


🏁 Script executed:

#!/bin/bash
# Check slang-ir-util.cpp where the string conversion happens - it likely has the full enum list
sed -n '2100,2200p' source/slang/slang-ir-util.cpp

Repository: shader-slang/slang

Length of output: 3705


🏁 Script executed:

#!/bin/bash
# Read the full enum definition from slang-ast-support-types.h
sed -n '220,280p' source/slang/slang-ast-support-types.h

Repository: shader-slang/slang

Length of output: 1932


Verify the autodiff conformance list: NullDifferential is intentionally excluded and handled separately.

The switch at lines 733-739 intentionally covers only autodiff interface conformance types (IDifferentiable, IDifferentiablePtr, IForwardDifferentiable, IBackwardDifferentiable, IBwdCallable). NullDifferential—a concrete struct type, not an interface—is deliberately excluded from this switch and handled in a separate code path at lines 2337-2339, where it is unconditionally cloned with cloneAndKeepAlive. This distinction is correct: interface witness tables need linkage-mode-dependent cloning, while NullDifferential always requires cloning regardless of link mode.

COM interface decorations (kIROp_ComInterfaceDecoration) and autodiff KnownBuiltinDecoration are mutually exclusive based on their attachment sites (COM on user/interface types, KnownBuiltin on stdlib meta types), so the ordering in shouldDeepCloneWitnessTable does not create a gap.

Sources: Coding guidelines, Path instructions

default:
break;
}
}
}

for (auto decor : table->getDecorations())
{
switch (decor->getOp())
Expand All @@ -697,27 +753,12 @@ bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* tab
}
}

auto conformanceType = getResolvedInstForDecorations(table->getConformanceType());
for (auto decor : conformanceType->getDecorations())
{
switch (decor->getOp())
{
case kIROp_ComInterfaceDecoration:
return true;
case kIROp_KnownBuiltinDecoration:
{
auto name = as<IRKnownBuiltinDecoration>(decor)->getName();
switch (name)
{
case KnownBuiltinDeclName::IDifferentiable:
case KnownBuiltinDeclName::IDifferentiablePtr:
case KnownBuiltinDeclName::IForwardDifferentiable:
case KnownBuiltinDeclName::IBackwardDifferentiable:
case KnownBuiltinDeclName::IBwdCallable:
return true;
}
break;
}
default:
break;
}
Expand Down Expand Up @@ -2140,6 +2181,11 @@ LinkedIR linkIR(CodeGenContext* codeGenContext)
for (auto irModule : irModules)
irModule->_ensureLinkingInfo();

// This is the final per-target code-generation link, so it is safe to prune
// auto-diff artifacts that the program never uses (see `cloneAnnotations` /
// `shouldDeepCloneWitnessTable`).
sharedContext->isFinalCodegenLink = true;

// Check if any user module uses auto-diff, if so we will need to link
// additional witnesses and decorations.
for (IRModule* irModule : userModules)
Expand Down
Loading