-
Notifications
You must be signed in to change notification settings - Fork 462
Restore auto-diff link gating to the final codegen link #11779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,6 +60,15 @@ struct IRSharedSpecContext | |
|
|
||
| bool useAutodiff = false; | ||
|
|
||
| // True only for the final code-generation link (`linkIR`), where the linked | ||
| // module is a throw-away copy built for one target and unused symbols are | ||
| // dropped on demand / by later DCE. It is false for `prelinkIR` and module | ||
| // precompilation, which mutate a module that must remain complete and | ||
| // self-consistent (it may be serialized, e.g. the core module). Auto-diff | ||
| // link-time pruning (see `cloneAnnotations` / `shouldDeepCloneWitnessTable`) | ||
| // is only safe in the former case. | ||
| bool isFinalCodegenLink = false; | ||
|
|
||
| IRBuilder builderStorage; | ||
|
|
||
| // The "global" specialization environment. | ||
|
|
@@ -206,6 +215,21 @@ static void cloneAnnotations(IRSpecContextBase* context, IRInst* clonedInst, IRI | |
| { | ||
| SLANG_UNUSED(clonedInst); | ||
|
|
||
| // `IRAnnotation`s exclusively carry auto-diff trait associations: a target's | ||
| // derivative functions and differential type/zero/add/pair witnesses (every | ||
| // `AnnotationKind` is differentiability-related). A program that does not use | ||
| // auto-diff never reads them, so cloning them here only bloats the linked | ||
| // module — pulling every differentiable builtin's derivative methods (e.g. | ||
| // `sin`/`cos`/`sqrt`) and `float`'s differential machinery through every | ||
| // downstream pass (specialize / simplifyIR / DCE) before they are finally | ||
| // eliminated as dead. Skip them for the final code-gen link when auto-diff is | ||
| // not in use; this restores the pre-PR-#9808 behavior where derivative info | ||
| // was never linked into non-differentiating modules. We must not skip them | ||
| // during prelink / precompilation, whose output module stays live (and may be | ||
| // serialized, e.g. the core module) and must keep its annotations. | ||
| if (context->getShared()->isFinalCodegenLink && !context->getShared()->useAutodiff) | ||
| return; | ||
|
|
||
| // Local annotations will be cloned normally as part of cloning their parent function/generic | ||
| // body. For module-scope annotations, we need to look them up since they won't get | ||
| // automatically pulled in. | ||
|
|
@@ -686,7 +710,39 @@ IRGlobalGenericParam* cloneGlobalGenericParamImpl( | |
|
|
||
| bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* table) | ||
| { | ||
| SLANG_UNUSED(context); | ||
| auto conformanceType = getResolvedInstForDecorations(table->getConformanceType()); | ||
|
|
||
| // Differentiable-interface witness tables are decided first, *before* the | ||
| // generic `HLSLExport` / `KeepAlive` rule below. PR #9808 marks these tables | ||
| // (`IDifferentiable`, `IForwardDifferentiable`, ...) and their entries | ||
| // `[HLSLExport]` so they can be linked across modules for auto-diff, but that | ||
| // export would otherwise force a deep clone into *every* program. A shader | ||
| // that references a differentiable builtin (e.g. `sin`) would then drag the | ||
| // table's `fwd_diff` / `bwd_diff` / `dadd` / `dzero` entries — and the whole | ||
| // derivative-function closure they reference — through every downstream pass, | ||
| // even with no auto-diff in use (the PR #9808 codegen regression). For the | ||
| // final code-gen link we instead defer the entries and clone only those that | ||
| // are actually referenced, unless the program uses auto-diff. Prelink / | ||
| // precompilation must keep the output module complete, so it still deep-clones. | ||
| for (auto decor : conformanceType->getDecorations()) | ||
| { | ||
| if (auto knownBuiltin = as<IRKnownBuiltinDecoration>(decor)) | ||
| { | ||
| switch (knownBuiltin->getName()) | ||
| { | ||
| case KnownBuiltinDeclName::IDifferentiable: | ||
| case KnownBuiltinDeclName::IDifferentiablePtr: | ||
| case KnownBuiltinDeclName::IForwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBackwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBwdCallable: | ||
| return !context->getShared()->isFinalCodegenLink || | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 Question: Is the priority inversion of The new differentiable-interface check runs before the table-decoration scan at This appears benign today:
Question: Either (a) move the |
||
| context->getShared()->useAutodiff; | ||
|
Comment on lines
+733
to
+739
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🔵 Trivial 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Find autodiff-related builtin names and decoration sites that should be represented
# in shouldDeepCloneWitnessTable's KnownBuiltinDeclName switch.
rg -n -C2 'KnownBuiltinDeclName::(I.*Differentiable|I.*Callable|.*Differential|.*Diff)' source/slang prelude source/standard-modules
# Find where COM decorations and known-builtin decorations are attached so overlap can be ruled out.
rg -n -C3 '(kIROp_ComInterfaceDecoration|addComInterfaceDecoration|IRComInterfaceDecoration|IRKnownBuiltinDecoration|addKnownBuiltinDecoration)' source/slang prelude source/standard-modules
# Find export/keepalive attachment for witness tables to confirm all exported autodiff witness
# conformance types are covered by the local switch.
rg -n -C3 '(HLSLExportDecoration|addHLSLExportDecoration|KeepAliveDecoration|addKeepAliveDecoration)' source/slang prelude source/standard-modulesRepository: shader-slang/slang Length of output: 50374 🏁 Script executed: #!/bin/bash
# List all KnownBuiltinDeclName enum values to check for completeness
rg -A 1 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100Repository: shader-slang/slang Length of output: 156 🏁 Script executed: #!/bin/bash
# Check the exact logic flow in shouldDeepCloneWitnessTable to see if NullDifferential
# handling at lines 2337-2339 is separate from the function being reviewed
sed -n '720,770p' source/slang/slang-ir-link.cppRepository: shader-slang/slang Length of output: 1927 🏁 Script executed: #!/bin/bash
# Find the KnownBuiltinDeclName enum definition
find source/slang -name "*.h" -exec rg -l "enum KnownBuiltinDeclName" {} \; | head -5Repository: shader-slang/slang Length of output: 156 🏁 Script executed: #!/bin/bash
# Once we find the file, list all enum values
rg -B 5 -A 80 "enum KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -100Repository: shader-slang/slang Length of output: 156 🏁 Script executed: #!/bin/bash
# Search for KnownBuiltinDeclName definition more broadly
rg "KnownBuiltinDeclName" source/slang/slang-ir-insts.h | head -20Repository: shader-slang/slang Length of output: 401 🏁 Script executed: #!/bin/bash
# Search in all .h files for the enum
rg -n "enum.*KnownBuiltinDeclName|KnownBuiltinDeclName.*{" source/slang/*.h | head -30Repository: shader-slang/slang Length of output: 320 🏁 Script executed: #!/bin/bash
# Check where KnownBuiltinDeclName is used to infer its definition
rg -n "struct IRKnownBuiltinDecoration" source/slang/slang-ir-insts.h -A 20Repository: shader-slang/slang Length of output: 1044 🏁 Script executed: #!/bin/bash
# Search for KnownBuiltinDeclName definition across all files
rg -n "enum class KnownBuiltinDeclName|typedef.*KnownBuiltinDeclName|KnownBuiltinDeclName =" source/ | head -20Repository: shader-slang/slang Length of output: 246 🏁 Script executed: #!/bin/bash
# Check slang-ir-util.cpp where the string conversion happens - it likely has the full enum list
sed -n '2100,2200p' source/slang/slang-ir-util.cppRepository: shader-slang/slang Length of output: 3705 🏁 Script executed: #!/bin/bash
# Read the full enum definition from slang-ast-support-types.h
sed -n '220,280p' source/slang/slang-ast-support-types.hRepository: shader-slang/slang Length of output: 1932 Verify the autodiff conformance list: The switch at lines 733-739 intentionally covers only autodiff interface conformance types ( COM interface decorations ( Sources: Coding guidelines, Path instructions |
||
| default: | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for (auto decor : table->getDecorations()) | ||
| { | ||
| switch (decor->getOp()) | ||
|
|
@@ -697,27 +753,12 @@ bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* tab | |
| } | ||
| } | ||
|
|
||
| auto conformanceType = getResolvedInstForDecorations(table->getConformanceType()); | ||
| for (auto decor : conformanceType->getDecorations()) | ||
| { | ||
| switch (decor->getOp()) | ||
| { | ||
| case kIROp_ComInterfaceDecoration: | ||
| return true; | ||
| case kIROp_KnownBuiltinDecoration: | ||
| { | ||
| auto name = as<IRKnownBuiltinDecoration>(decor)->getName(); | ||
| switch (name) | ||
| { | ||
| case KnownBuiltinDeclName::IDifferentiable: | ||
| case KnownBuiltinDeclName::IDifferentiablePtr: | ||
| case KnownBuiltinDeclName::IForwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBackwardDifferentiable: | ||
| case KnownBuiltinDeclName::IBwdCallable: | ||
| return true; | ||
| } | ||
| break; | ||
| } | ||
| default: | ||
| break; | ||
| } | ||
|
|
@@ -2140,6 +2181,11 @@ LinkedIR linkIR(CodeGenContext* codeGenContext) | |
| for (auto irModule : irModules) | ||
| irModule->_ensureLinkingInfo(); | ||
|
|
||
| // This is the final per-target code-generation link, so it is safe to prune | ||
| // auto-diff artifacts that the program never uses (see `cloneAnnotations` / | ||
| // `shouldDeepCloneWitnessTable`). | ||
| sharedContext->isFinalCodegenLink = true; | ||
|
|
||
| // Check if any user module uses auto-diff, if so we will need to link | ||
| // additional witnesses and decorations. | ||
| for (IRModule* irModule : userModules) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Gap: No regression test locks in the new gating behavior.
The PR's correctness claim is that, during the final per-target codegen link of a non-auto-diff program, it is safe to (a) skip
cloneAnnotationsentirely and (b) skip deep-cloning differentiable-interface witness tables. Both are new silent behavior changes in IR link / specialization passes. No.slangtest was added, so:doesModuleUseAutodiff, of the set ofKnownBuiltinDeclNamevalues gated here, or of the wiring that flipsisFinalCodegenLinkonlinkIRvs.prelinkIR/ module precompilation can silently re-introduce the original PR Refactor auto-diff implementation. #9808 regression with zero CI failure.tests/autodiff/suite covers the positive (auto-diff in use) path, so it catches "we wrongly skipped something auto-diff needed." It does not catch "we wrongly clone auto-diff artifacts into non-auto-diff programs" or "we wrongly skip an[HLSLExport]/[KeepAlive]witness table."Suggestion: Add a CPU/
slangiregression test undertests/autodiff/(ortests/compile/) that:IDifferentiablewitness (sin,cos,sqrt), with//TEST:SIMPLE(filecheck=CHECK):-target hlsl -dump-irandCHECK-NOT:againstIDifferentiable_Witness/__fwd_diff/ annotation-id markers — locking in the optimization.fwd_diff(sin)+CHECK:for the witness, locking in that the gate does not prune when auto-diff is in use.[HLSLExport] struct Foo : IDifferentiableand verifies the witness survives a non-auto-diff link (locks the reorder behavior — see related comment on line 738).The PR description says #11780 tracks a separate
simplifyIRregression — that's distinct from a regression test for this PR's correctness claim, which is still missing.