From 1d6c26ad1a4f4e30dacd65c12fed8f22e36a01d9 Mon Sep 17 00:00:00 2001 From: jvepsalainen Date: Fri, 26 Jun 2026 10:32:29 +0000 Subject: [PATCH] Restore auto-diff link gating to the final codegen link (perf regression from #9808) Motivation ---------- PR #9808 (auto-diff overhaul) caused a broad compile-time perf regression between releases 2026.5 and 2026.7, visible in the compile-perf suite (tools/compile-perf). The "per-compile floor" workloads regressed sharply: e.g. the `minimal` (empty shader) `linkIR` rose ~10-13x and `linkAndOptimizeIR` rose similarly, and every compile pays this regardless of whether it uses auto-diff. Root cause ---------- #9808 removed the `useAutodiff` gating that the IR linker previously used to avoid pulling auto-diff artifacts into programs that do not differentiate: * `shouldDeepCloneWitnessTable` used to return `useAutodiff` for the `IDifferentiable` family; #9808 made it return `true` unconditionally and also added the new `IForwardDifferentiable` / `IBackwardDifferentiable` / `IBwdCallable` interfaces to the always-deep-clone set. As a result every program deep-clones the differentiable-interface witness tables and all their entries (`Differential`, `dzero`, `dadd`, fwd/bwd methods). * The new `cloneAnnotations` step clones module-scope `IRAnnotation`s for every cloned inst. Every `AnnotationKind` is differentiability-related (DifferentialType/Zero/Add/PairType, Forward/BackwardDerivative, ...), so this links a differentiable builtin's derivative associations into programs that never use them. This dead auto-diff IR is then carried through specialize / simplifyIR / DCE before finally being eliminated, inflating link and optimization time. Fix --- Restore the `useAutodiff` gating, but apply it ONLY to the final per-target code-generation link (`linkIR`). A new `IRSharedSpecContext::isFinalCodegenLink` flag is set true only there. The flag is required because the same clone paths run during `prelinkIR` and module precompilation, whose output module must stay complete and self-consistent (it may be serialized -- e.g. the core module). Gating those paths corrupts the serialized core module and breaks auto-diff code generation; gating only the throw-away per-target link is safe because unused symbols are dropped on demand / by later DCE. * `shouldDeepCloneWitnessTable`: the differentiable-interface decision is now made BEFORE the generic `[HLSLExport]`/`[KeepAlive]` rule. #9808 marks these witness tables `[HLSLExport]` (for cross-module auto-diff), which previously forced a deep clone regardless of gating. For the final codegen link of a non-differentiating program we now defer the entries and clone only those actually referenced. * `cloneAnnotations`: skip cloning annotations for the final codegen link when auto-diff is not in use. Both gates fall back to the original (always-clone) behavior during prelink / precompilation and whenever the program uses auto-diff, so auto-diff semantics are unchanged. Validation ---------- * minimal `linkAndOptimizeIR`: 0.79ms vs the pre-#9808 parent's 1.87ms (the regressed value was ~15ms); `linkIR` back to ~0.1ms. * Auto-diff correctness: the interpreter (slangi) auto-diff tests under tests/autodiff/ produce correct numeric derivatives, and auto-diff HLSL / SPIR-V code generation no longer crashes, with the core module regenerated by the patched compiler. Scope ----- This addresses the per-compile-floor / `linkIR` regression class. A separate, larger codegen-side regression remains in `simplifyIR` for shaders that use differentiable numeric builtins (`sin`/`sqrt`/...): #9808 wove differentiability into the numeric type hierarchy (`interface IFloat : IArithmetic, IDifferentiable`, plus the new differentiable interfaces), so a concrete non-auto-diff shader transitively links `float`'s entire differentiable-trait conformance closure. That is tracked separately. --- source/slang/slang-ir-link.cpp | 78 +++++++++++++++++++++++++++------- 1 file changed, 62 insertions(+), 16 deletions(-) diff --git a/source/slang/slang-ir-link.cpp b/source/slang/slang-ir-link.cpp index d2e37dedfbf..e996389df54 100644 --- a/source/slang/slang-ir-link.cpp +++ b/source/slang/slang-ir-link.cpp @@ -60,6 +60,15 @@ struct IRSharedSpecContext bool useAutodiff = false; + // True only for the final code-generation link (`linkIR`), where the linked + // module is a throw-away copy built for one target and unused symbols are + // dropped on demand / by later DCE. It is false for `prelinkIR` and module + // precompilation, which mutate a module that must remain complete and + // self-consistent (it may be serialized, e.g. the core module). Auto-diff + // link-time pruning (see `cloneAnnotations` / `shouldDeepCloneWitnessTable`) + // is only safe in the former case. + bool isFinalCodegenLink = false; + IRBuilder builderStorage; // The "global" specialization environment. @@ -206,6 +215,21 @@ static void cloneAnnotations(IRSpecContextBase* context, IRInst* clonedInst, IRI { SLANG_UNUSED(clonedInst); + // `IRAnnotation`s exclusively carry auto-diff trait associations: a target's + // derivative functions and differential type/zero/add/pair witnesses (every + // `AnnotationKind` is differentiability-related). A program that does not use + // auto-diff never reads them, so cloning them here only bloats the linked + // module — pulling every differentiable builtin's derivative methods (e.g. + // `sin`/`cos`/`sqrt`) and `float`'s differential machinery through every + // downstream pass (specialize / simplifyIR / DCE) before they are finally + // eliminated as dead. Skip them for the final code-gen link when auto-diff is + // not in use; this restores the pre-PR-#9808 behavior where derivative info + // was never linked into non-differentiating modules. We must not skip them + // during prelink / precompilation, whose output module stays live (and may be + // serialized, e.g. the core module) and must keep its annotations. + if (context->getShared()->isFinalCodegenLink && !context->getShared()->useAutodiff) + return; + // Local annotations will be cloned normally as part of cloning their parent function/generic // body. For module-scope annotations, we need to look them up since they won't get // automatically pulled in. @@ -686,7 +710,39 @@ IRGlobalGenericParam* cloneGlobalGenericParamImpl( bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* table) { - SLANG_UNUSED(context); + auto conformanceType = getResolvedInstForDecorations(table->getConformanceType()); + + // Differentiable-interface witness tables are decided first, *before* the + // generic `HLSLExport` / `KeepAlive` rule below. PR #9808 marks these tables + // (`IDifferentiable`, `IForwardDifferentiable`, ...) and their entries + // `[HLSLExport]` so they can be linked across modules for auto-diff, but that + // export would otherwise force a deep clone into *every* program. A shader + // that references a differentiable builtin (e.g. `sin`) would then drag the + // table's `fwd_diff` / `bwd_diff` / `dadd` / `dzero` entries — and the whole + // derivative-function closure they reference — through every downstream pass, + // even with no auto-diff in use (the PR #9808 codegen regression). For the + // final code-gen link we instead defer the entries and clone only those that + // are actually referenced, unless the program uses auto-diff. Prelink / + // precompilation must keep the output module complete, so it still deep-clones. + for (auto decor : conformanceType->getDecorations()) + { + if (auto knownBuiltin = as(decor)) + { + switch (knownBuiltin->getName()) + { + case KnownBuiltinDeclName::IDifferentiable: + case KnownBuiltinDeclName::IDifferentiablePtr: + case KnownBuiltinDeclName::IForwardDifferentiable: + case KnownBuiltinDeclName::IBackwardDifferentiable: + case KnownBuiltinDeclName::IBwdCallable: + return !context->getShared()->isFinalCodegenLink || + context->getShared()->useAutodiff; + default: + break; + } + } + } + for (auto decor : table->getDecorations()) { switch (decor->getOp()) @@ -697,27 +753,12 @@ bool shouldDeepCloneWitnessTable(IRSpecContextBase* context, IRWitnessTable* tab } } - auto conformanceType = getResolvedInstForDecorations(table->getConformanceType()); for (auto decor : conformanceType->getDecorations()) { switch (decor->getOp()) { case kIROp_ComInterfaceDecoration: return true; - case kIROp_KnownBuiltinDecoration: - { - auto name = as(decor)->getName(); - switch (name) - { - case KnownBuiltinDeclName::IDifferentiable: - case KnownBuiltinDeclName::IDifferentiablePtr: - case KnownBuiltinDeclName::IForwardDifferentiable: - case KnownBuiltinDeclName::IBackwardDifferentiable: - case KnownBuiltinDeclName::IBwdCallable: - return true; - } - break; - } default: break; } @@ -2140,6 +2181,11 @@ LinkedIR linkIR(CodeGenContext* codeGenContext) for (auto irModule : irModules) irModule->_ensureLinkingInfo(); + // This is the final per-target code-generation link, so it is safe to prune + // auto-diff artifacts that the program never uses (see `cloneAnnotations` / + // `shouldDeepCloneWitnessTable`). + sharedContext->isFinalCodegenLink = true; + // Check if any user module uses auto-diff, if so we will need to link // additional witnesses and decorations. for (IRModule* irModule : userModules)