Hard-code a fast path for builtin scalar/vector/matrix operators#11493
Hard-code a fast path for builtin scalar/vector/matrix operators#11493csyonghe wants to merge 6 commits into
Conversation
Resolve builtin same-type arithmetic (+ - * / %), comparison (== != < > <= >=), bitwise (& | ^ << >>), equality on bool, and unary (- ! ~) operators directly during semantic checking, instead of enumerating the generic `operator OP` candidate set and running overload resolution + generic inference + type coercion. When both operands of a binary operator (or the operand of a unary operator) have the *same* builtin scalar/vector/matrix type, no implicit conversion and no user-defined overload can apply, so the result is exactly the corresponding builtin IR op. convertToBuiltinArithmeticOp recognizes these during checking, gives the OperatorExpr its result type directly, and marks it isLoweredAsBuiltinArithmetic; lowerBuiltinArithmeticOp emits the IR op. The OperatorExpr node is preserved (not rewritten to a call) so form-sensitive analyses still see an operator. Compile-time-constant contexts (array extents, generic value arguments, static const initializers) are left on the normal resolution + folding path, because a symbolic constant result (e.g. N/2) folds to a deferred FuncCallIntVal that needs the resolved operator decl. The constant folder also folds a fast-path node directly by operator name when its operands reduce to concrete integers. The for-loop max-iterations inference is taught to read the comparison operator from a fast-path predicate node, so loops in [Differentiable] functions still infer [MaxIters] (previously failed with E30510). Mixed/promoted operand types, user-defined operator types, && / || (short-circuit), and symbolic-constant operators all fall through to normal resolution, so generated code is byte-identical. Speeds up SemanticChecking by ~60-67% on operator-dense shaders and ~9% on neural.slang.
📝 WalkthroughWalkthroughThis PR adds a builtin-operator fast-path: eligible same-type arithmetic/comparison/bitwise/unary operators are rewritten to an OperatorExpr flagged isLoweredAsBuiltinArithmetic, enabling direct IR lowering and integrating constant folding, link-time IntVal forms, loop-iteration inference, and autodiff handling. ChangesBuiltin arithmetic fast-path
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f772875-fdfb-415d-a3bb-c2b36b5ce104
📒 Files selected for processing (7)
source/slang/slang-ast-expr.hsource/slang/slang-check-expr.cppsource/slang/slang-check-impl.hsource/slang/slang-check-stmt.cppsource/slang/slang-lower-to-ir.cpptests/autodiff/diff-loop-builtin-operator.slangtests/language-feature/operator-overload/builtin-operator-fastpath.slang
👮 Files not reviewed due to content moderation or server errors (1)
- source/slang/slang-lower-to-ir.cpp
| // Arithmetic over constants (including a previously fast-pathed builtin op) is constant. | ||
| if (auto opExpr = as<OperatorExpr>(expr)) | ||
| { | ||
| if (opExpr->arguments.getCount() == 2) | ||
| return _isCompileTimeConstantArith(opExpr->arguments[0]) && | ||
| _isCompileTimeConstantArith(opExpr->arguments[1]); | ||
| } | ||
| return false; |
There was a problem hiding this comment.
Unary constant arithmetic is not recognized as compile-time-constant here.
_isCompileTimeConstantArith() treats OperatorExpr as constant only when it has 2 args. Unary forms
(e.g. -N, ~N, !B) return false, so constant expressions that include unary operators can be
misclassified and incorrectly routed through the builtin fast path.
Proposed fix
- if (auto opExpr = as<OperatorExpr>(expr))
- {
- if (opExpr->arguments.getCount() == 2)
- return _isCompileTimeConstantArith(opExpr->arguments[0]) &&
- _isCompileTimeConstantArith(opExpr->arguments[1]);
- }
+ if (auto opExpr = as<OperatorExpr>(expr))
+ {
+ if (opExpr->arguments.getCount() == 1)
+ return _isCompileTimeConstantArith(opExpr->arguments[0]);
+ if (opExpr->arguments.getCount() == 2)
+ return _isCompileTimeConstantArith(opExpr->arguments[0]) &&
+ _isCompileTimeConstantArith(opExpr->arguments[1]);
+ }| // Vector operators. | ||
| int4 va = int4(a, b, a + b, a - b); // (7,3,10,4) | ||
| int4 vb = va * 2 - 1; // (13,5,19,7) | ||
| int4 vs = va + vb; // (20,8,29,11) | ||
| outputBuffer[14] = vs.x + vs.y; // 28 | ||
| bool4 cmp = va < vb; // (true,true,true,true) | ||
| outputBuffer[15] = (cmp.x && cmp.w) ? 1 : 0; // 1 | ||
|
|
There was a problem hiding this comment.
Add matrix-operator assertions to match fast-path feature scope.
Line 36 onward validates vector operators, but there is no matrix operator case even though this PR’s fast-path scope includes builtin matrix operators. Please add at least one runtime-dependent matrix arithmetic assertion (and expected CHECK) so matrix lowering regressions are caught by this test.
As per coding guidelines, “Verify test correctness… Ensure new features have corresponding tests.”
Source: Coding guidelines
Two CI regressions from the operator fast path: - In GLSL-compatibility mode (`-allow-glsl`) several builtin operators have different semantics than in Slang/HLSL: `vec == vec` yields a scalar bool (all-components-equal) rather than `vector<bool,N>`, and `mat * mat` is a matrix product rather than a component-wise multiply. The fast path hard-coded HLSL semantics, breaking tests/glsl-intrinsic/* and tests/glsl/matrix-mul.slang. Disable the fast path when AllowGLSL is set so normal resolution picks the GLSL operator overloads. - Builtin matrix operators inline to a per-element form whose emitted variable naming/structure differs from a single IR matrix op, so fast-pathing matrices was not byte-identical. Matrices are rare and give little benefit, so restrict the fast path to scalar and vector operands. With these, the fast path is byte-identical to normal resolution on all targets.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3fcd91f8-ba1a-4b23-aaa8-7cfbe4ed287d
📒 Files selected for processing (1)
source/slang/slang-check-expr.cpp
| // In GLSL-compatibility mode several builtin operators have different semantics than in | ||
| // Slang/HLSL — e.g. `vec == vec` yields a scalar `bool` (all-components-equal) rather | ||
| // than a `vector<bool,N>`, and `mat * mat` is a matrix product rather than a | ||
| // component-wise multiply. Those differences live in the GLSL module's operator | ||
| // overloads, so let normal resolution pick them rather than hard-coding HLSL semantics. | ||
| if (getOptionSet().getBoolOption(CompilerOptionName::AllowGLSL)) | ||
| return nullptr; |
There was a problem hiding this comment.
Add regression coverage for the new GLSL/matrix bail-outs.
These guards change which operator expressions bypass overload resolution, but the provided fast-path coverage only exercises runtime scalar/vector operators. Please add .slang regressions that prove -allow-glsl still routes cases like vec == vec and mat * mat through the GLSL overloads, and that ordinary matrix operators stay off this fast path; otherwise a later cleanup can silently re-enable the wrong semantics.
As per coding guidelines, "Include tests: Add regression tests as .slang files under tests/."
Also applies to: 4506-4507, 4600-4606
Source: Coding guidelines
There was a problem hiding this comment.
Pull request overview
This PR introduces a compiler fast path for the very common case of builtin scalar/vector operators where both operands already have the same builtin type, avoiding generic operator overload candidate enumeration and inference during semantic checking. The operator remains an OperatorExpr in the checked AST (instead of being rewritten into a call), and IR lowering emits the corresponding IR op directly when the fast path is enabled.
Changes:
- Add
OperatorExpr::isLoweredAsBuiltinArithmeticto mark eligible operators for direct builtin lowering. - Implement
convertToBuiltinArithmeticOp()in semantic checking to recognize same-type builtin operators (with carve-outs for constant-eval contexts and GLSL-compat mode) and assign result types without overload resolution. - Implement
lowerBuiltinArithmeticOp()in IR lowering to emit the matching IR instruction, and update loop max-iteration inference + constant folding to handle fast-pathed operator nodes; add regression tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
source/slang/slang-ast-expr.h |
Adds an OperatorExpr flag to mark nodes that should lower via the builtin fast path. |
source/slang/slang-check-impl.h |
Declares the semantic-checking fast-path hook for builtin operators. |
source/slang/slang-check-expr.cpp |
Implements fast-path recognition + typing, and updates constant folding to handle unresolved operator-name callees. |
source/slang/slang-lower-to-ir.cpp |
Lowers marked operator expressions directly to the corresponding IR ops. |
source/slang/slang-check-stmt.cpp |
Updates loop trip-count inference to recognize fast-path predicate nodes. |
tests/language-feature/operator-overload/builtin-operator-fastpath.slang |
Adds a runtime-operand regression test to exercise fast-pathed builtin operators. |
tests/autodiff/diff-loop-builtin-operator.slang |
Adds an autodiff regression test for [Differentiable] loops with fast-pathed comparison predicates. |
| BaseTypeInfo::Flag::FloatingPoint) != 0; | ||
| } | ||
|
|
||
| auto opText = getText(as<VarExpr>(expr->functionExpr)->name); |
| // Emit a builtin floating-point scalar/vector arithmetic operator directly as the | ||
| // corresponding IR arithmetic instruction, bypassing callable resolution/lowering. | ||
| // The operands and result type are already checked; differentiability is handled by | ||
| // autodiff's rules for the emitted IR ops. |
| // Builtin same-type arithmetic/comparison on scalar/vector/matrix operands, | ||
| // recognized during checking (see `convertToBuiltinArithmeticOp`): emit the IR | ||
| // op directly, skipping the resolved-callable lowering path. |
| // If `expr` is an arithmetic (`+ - * / %`) or comparison (`== != < > <= >=`) operator | ||
| // on two operands of the *same* builtin integer/floating-point scalar, vector, or | ||
| // matrix type (comparison: scalar/vector only), with at least one runtime operand and | ||
| // outside a constant-evaluation context, mark it for direct builtin IR lowering and |
| // Fast path: builtin same-type arithmetic/comparison on scalar/vector/matrix operands | ||
| // (`a + b`, `a < b`, etc.) is marked for direct IR lowering and skips generic operator | ||
| // overload resolution. |
| // When set, this builtin same-type arithmetic/comparison operator on scalar/vector/ | ||
| // matrix operands was recognized during checking and given its result type directly, | ||
| // without resolving to a generic `operator OP` candidate. lower-to-ir emits the |
| // (convertToBuiltinArithmeticOp / lowerBuiltinArithmeticOp). All operators below act on | ||
| // runtime values (read from the buffer) so they are genuinely fast-pathed (not folded), | ||
| // exercising arithmetic, comparison, bitwise, equality, and unary operators on scalar and | ||
| // vector operands. |
| else if (opText == "==") | ||
| compareOp = kIROp_Eql; | ||
| else if (opText == "!=") | ||
| compareOp = kIROp_Neq; |
| int a = outputBuffer[0] + 7; // 7 | ||
| int b = outputBuffer[1] + 3; // 3 | ||
|
|
||
| outputBuffer[0] = a + b; // 10 |
…cope by import Revisiting the fast-path exclusions: - Matrices are handled again (arithmetic, bitwise, comparison -> matrix<bool>, and unary). The emitted code is not byte-identical to the old inlined-operator form (variable naming/structure differs) but is semantically identical, which is sufficient. - GLSL operator semantics are now detected by either `-allow-glsl` *or* the `glsl` module being in scope (`isGLSLOperatorScope`). A user can `import glsl;` without `-allow-glsl` and pick up its `operator*` overloads that make `mat * mat` a matrix product; the fast path must defer to normal resolution in that case too. (`vec == vec` -> scalar bool is still keyed on `-allow-glsl`.)
…IntVal Stop excluding constant-evaluation contexts from the builtin-operator fast path. A symbolic constant operator (e.g. `N / 2` for a generic value parameter) no longer needs a resolved operator decl to fold: introduce a decl-free `BuiltinOperationIntVal` (operator identified by a `BuiltinOperationKind` enum plus operand `IntVal`s) that folds, substitutes, resolves, mangles, serializes, and lowers to IR (`visitBuiltinOperationIntVal` -> `emitConstexpr*`). The constant folder builds it for fast-path operator nodes; same-type builtin operators are therefore always fast-pathed, so a decl-based `FuncCallIntVal` never forms for them and there is a single representation (no Val divergence). Both checking gates are removed (the context gate and the both-operands-constant gate); `_isCompileTimeConstantArith` is no longer needed. Fixes found while removing the gates: - IR value lowering had no case for the new IntVal (`visitBuiltinOperationIntVal`). - `_initExprIsRuntimeValue` treated a fast-path operator (unresolved callee) as a runtime value, wrongly rejecting `static const` global initializers (E31226). - AST serialization drops the operator-name callee of a fast-path node, so a cross-module imported initializer can no longer be re-folded; `tryConstantFoldExpr` now recognizes the unresolved-callee form by structure, and `tryConstantFoldDeclRef` falls back to the decl's stored, serializable `val` (which round-trips). Constant-context generics (`vector<T, N/2>`, `N*2+1`, …) and the runtime corpus are byte-identical to before; the cross-module type-traits test passes.
…motion
Extend the builtin-operator fast path to handle binary operators whose
operands have different builtin scalar/vector/matrix types (e.g. `int + float`,
`uint64_t | uint`, `scalar * vector`). Previously only same-type operands were
fast-pathed; mixed-type operands fell back to generic operator overload
resolution.
`getBuiltinArithmeticCommonType` computes the type overload resolution would
converge on, following the usual arithmetic conversions implemented as code
logic rather than a lookup table:
- float beats int; among floats the larger size wins;
- among ints the larger size wins, and on a size tie the unsigned type wins;
- bool promotes to the other operand;
- scalar/vector/matrix shapes broadcast (matching extents required for
vector-vector / matrix-matrix).
This reproduces overload resolution exactly: the common element type is the
wider / no-narrowing one, so the operand coercions never narrow and thus never
emit the "implicit conversion not recommended" warning that would otherwise
break the warning-fatal core module bootstrap.
Only the element type is coerced; each operand keeps its own shape so the IR
stays in the mixed vector/scalar (or matrix/scalar) form that backends
optimize -- e.g. `vector * scalar` remains a two-shape `mul`, which SPIR-V
still lowers to OpVectorTimesScalar rather than a splat + component-wise
multiply. Shape broadcast is handled entirely in lowering/emit, so the common
case (same element type, different shape, such as `v * f`) needs no coerce
call at all.
Codegen for mixed-type operators is byte-identical to the previous
overload-resolution path (verified on HLSL and SPIR-V); validated by the
warning-free core module bootstrap and a full slang-test sweep with no real
regressions.
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ee99b04f-59d7-4240-bfc0-abcc77c0d3bd
📒 Files selected for processing (9)
source/slang/slang-ast-val.cppsource/slang/slang-ast-val.hsource/slang/slang-check-conversion.cppsource/slang/slang-check-decl.cppsource/slang/slang-check-expr.cppsource/slang/slang-check-impl.hsource/slang/slang-lower-to-ir.cppsource/slang/slang-mangle.cppsource/slang/slang-type-layout.cpp
👮 Files not reviewed due to content moderation or server errors (1)
- source/slang/slang-lower-to-ir.cpp
| bool SemanticsExprVisitor::isGLSLOperatorScope() | ||
| { | ||
| if (getOptionSet().getBoolOption(CompilerOptionName::AllowGLSL)) | ||
| return true; | ||
| for (auto moduleDecl : getShared()->importedModulesList) | ||
| { | ||
| if (moduleDecl->getName() && getText(moduleDecl->getName()) == "glsl") | ||
| return true; | ||
| } | ||
| return false; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n -C3 '\bimportedModulesList\b'Repository: shader-slang/slang
Length of output: 3316
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show isGLSLOperatorScope() definition and nearby usage
sed -n '4390,4485p' source/slang/slang-check-expr.cpp
# 2) Show convertToBuiltinArithmeticOp() and where it calls isGLSLOperatorScope()
rg -n 'convertToBuiltinArithmeticOp|isGLSLOperatorScope\(' source/slang/slang-check-expr.cpp -n
# Print the definition + call sites with some context
rg -n -C3 'convertToBuiltinArithmeticOp\(' source/slang/slang-check-expr.cpp
rg -n -C3 'isGLSLOperatorScope\(' source/slang/slang-check-expr.cpp
# 3) Inspect SharedSemanticsContext fields for importedModulesList and glslModuleDecl
sed -n '700,820p' source/slang/slang-check-impl.h
# 4) Inspect where importedModulesList is populated (import handling) and comment about scope
sed -n '15980,16120p' source/slang/slang-check-decl.cpp
sed -n '16480,16540p' source/slang/slang-check-decl.cppRepository: shader-slang/slang
Length of output: 17307
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find all construction sites / aliases for SharedSemanticsContext
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h source/slang/slang-check*.cpp
# Find where isGLSLOperatorScope is called (it’s only in convertToBuiltinArithmeticOp per earlier rg,
# but check for other uses)
rg -n '\bisGLSLOperatorScope\s*\(' source/slang/slang-check-*.cpp
# Check convertToBuiltinArithmeticOp caller context near line 4500-4520
sed -n '4485,4545p' source/slang/slang-check-expr.cpp
# Check how primary-module vs m_module==nullptr modes affect importedModulesList usage
rg -n 'm_module\s*\)' source/slang/slang-check-decl.cppRepository: shader-slang/slang
Length of output: 3286
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show SharedSemanticsContext construction at slang-check-impl.h around line 783
sed -n '740,860p' source/slang/slang-check-impl.h
# Also locate other mentions of SharedSemanticsContext creation in check-*.cpp/h
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h
# For each call site, print small surrounding region to understand lifetime parameters (module vs nullptr)
# We'll capture up to 40 lines around each match.
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h | cut -d: -f1,2 | headRepository: shader-slang/slang
Length of output: 4859
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find getShared() definition for SemanticsExprVisitor / SemanticsVisitor
rg -n 'getShared\(\)' source/slang/slang-check-*.cpp source/slang/slang-check-*.h
# 2) Find construction/allocation of SharedSemanticsContext
rg -n '(new\s+SharedSemanticsContext|make_shared\s*<\s*SharedSemanticsContext|makeShared\s*<\s*SharedSemanticsContext|SharedSemanticsContext\s*\{)' source/slang/slang-check-*.cpp source/slang/slang-check-*.h
# 3) Also look for places that store SharedSemanticsContext in a member named m_shared or similar
rg -n 'SharedSemanticsContext\*|RefPtr<SharedSemanticsContext>|m_shared' source/slang/slang-check-*.cpp source/slang/slang-check-*.hRepository: shader-slang/slang
Length of output: 8950
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Direct instantiations
rg -n 'new\s+SharedSemanticsContext|make_shared\s*<\s*SharedSemanticsContext|SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h
# Also find if SemanticsContext stores a SharedSemanticsContext
rg -n 'class\s+SemanticsContext|struct\s+SemanticsContext|SemanticsContext::' source/slang/slang-check-*.h source/slang/slang-check-*.cpp
# Find SemanticsExprVisitor/SubVisitor constructions that pass SemanticsContext/shared
rg -n 'SemanticsVisitor\s*\(|SemanticsExprVisitor\s*\(|SemanticsContext\s*\(' source/slang/slang-check-expr.cpp source/slang/slang-check-decl.cpp source/slang/slang-check-*.cppRepository: shader-slang/slang
Length of output: 6043
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect SharedSemanticsContext creation sites
sed -n '1,120p' source/slang/slang-check-type.cpp
sed -n '2860,3245p' source/slang/slang-check-shader.cpp
# Also inspect SemanticsContext wrapper around SharedSemanticsContext (to see lifetime passed by reference/pointer)
sed -n '1040,1125p' source/slang/slang-check-impl.h
# Inspect SemanticsVisitor/ExprVisitor creation path for expression checking
# (focus on where a SemanticsExprVisitor gets its shared context)
rg -n 'SemanticsExprVisitor\(' source/slang/slang-check-expr.cpp
rg -n 'SemanticsVisitor\(' source/slang/slang-check-expr.cpp | head -n 50
# Show area around SemanticsExprVisitor construction/usage near the isGLSLOperatorScope call
sed -n '3400,3720p' source/slang/slang-check-expr.cppRepository: shader-slang/slang
Length of output: 20695
Use cached GLSL module marker in isGLSLOperatorScope (avoid per-call scan)
isGLSLOperatorScope() currently iterates getShared()->importedModulesList on the hot operator path; however SharedSemanticsContext already caches the imported GLSL module in glslModuleDecl during importModuleIntoScope(). Replace the list scan with a getShared()->glslModuleDecl != nullptr check (while keeping the AllowGLSL option early return). This check is scoped to the current SharedSemanticsContext (a semantics-checking session), not global compilation state, so the guard won’t be broadened across unrelated sessions—but it can still be made constant-time.
…verage Code review fixes: - BuiltinOperationIntVal::tryFoldImpl now validates argument arity against the operation (unary ops take 1 arg, binary take 2) before reading operands, so a malformed node can neither crash on zero args nor mis-fold a binary op with one. - Harmonize constant shift folding across the two Val representations. A shared `_tryFoldConstantShift` helper (negative count -> symbolic; out-of-range count masked to the operand width) is now used by both FuncCallIntVal::tryFoldImpl and BuiltinOperationIntVal::tryFoldImpl, so the same `x << y` folds to the same constant on whichever path reaches it (and the previous raw-shift UB in FuncCallIntVal is gone). - lowerBuiltinArithmeticOp asserts the fast-path callee is the operator-name VarExpr it relies on (and reads it through a local) instead of an unchecked dereference. - Update stale comments that predated mixed-type support and matrix re-inclusion (ast-expr.h flag, convertToBuiltinArithmeticOp/lowerBuiltinArithmeticOp/ visitInvokeExpr): the fast path covers same- and mixed-type scalar/vector/matrix operands across arithmetic/comparison/bitwise/shift/unary, and folds constants via BuiltinOperationIntVal. Clarify why tryInferLoopMaxIterations maps `==`/`!=`. Test coverage (new .slang regressions; runtime operands so the ops are genuinely fast-pathed): - builtin-operator-fastpath-float: float arithmetic + FRem, float comparison, float matrix arithmetic, vector*scalar and matrix+scalar broadcast, and mixed-type promotion (int+float, half+float, int3+float3). - builtin-operator-fastpath-uint: unsigned modulo and *logical* right shift on a high-bit-set value, unsigned compare, and unsigned vector ops. - builtin-operator-fastpath-bool: bool `==`/`!=` and unary `!` on scalar and vector bool. - builtin-operator-fastpath-const: compile-time-constant folding in static const and generic value-argument contexts, including the shift fold and symbolic `N * 2` that re-evaluates on substitution. - builtin-operator-fastpath-glsl: pins GLSL-scope semantics (mat*mat product, vec==vec scalar bool) so the fast path staying disabled there is enforced. - diff-loop-builtin-operator: add the forward-mode autodiff path alongside reverse.
|
Thanks for the thorough review. Addressed in f4bfd88. Summary of how each item was handled: Code
Test coverageAdded regression tests under
ValidationAll new tests pass on cpu / vk / dx12 / interpreter. A full |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/slang/slang-check-expr.cpp (1)
4596-4671:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefer mixed-type coercions until the fast path is known to apply.
Lines 4624-4638 rewrite
expr->argumentsbefore Lines 4664-4671 verify that the promoted base type is valid for this operator family. For a mixed-type bitwise case likeintVal & floatVal, this path inserts anint -> floatcast, then returnsnullptrbecause bitwise ops require an integer base.visitInvokeExpr()then falls back to normal resolution on the already-mutated(float, float)arguments, so downstream overload selection/diagnostics no longer see the user's original expression.Suggested fix
Type* commonType = getBuiltinArithmeticCommonType(leftArg->type.type, rightArg->type.type); if (!commonType) return nullptr; BaseType commonBase; IntVal *cRows, *cCols; - _getBuiltinNumericShape(commonType, commonBase, cRows, cCols); + if (!_getBuiltinNumericShape(commonType, commonBase, cRows, cCols)) + return nullptr; + auto commonFlags = BaseTypeInfo::getInfo(commonBase).flags; + bool commonIsInteger = (commonFlags & BaseTypeInfo::Flag::Integer) != 0; + bool commonIsFloat = (commonFlags & BaseTypeInfo::Flag::FloatingPoint) != 0; + bool commonIsBool = (commonBase == BaseType::Bool); + bool commonIsEquality = (opText == "==" || opText == "!="); + bool commonEligible = isBitwise + ? commonIsInteger + : (commonIsEquality ? (commonIsInteger || commonIsFloat || commonIsBool) + : (commonIsInteger || commonIsFloat)); + if (!commonEligible) + return nullptr; Type* commonElementType = m_astBuilder->getBuiltinType(commonBase);Then keep the existing coercion/write-back block below that guard, or stage the coerced operands in locals and only assign them back to
expr->argumentsafter the fast path is definitely accepted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5c065588-decc-4c23-b987-1947df98d98c
📒 Files selected for processing (12)
source/slang/slang-ast-expr.hsource/slang/slang-ast-val.cppsource/slang/slang-check-expr.cppsource/slang/slang-check-impl.hsource/slang/slang-check-stmt.cppsource/slang/slang-lower-to-ir.cpptests/autodiff/diff-loop-builtin-operator.slangtests/language-feature/operator-overload/builtin-operator-fastpath-bool.slangtests/language-feature/operator-overload/builtin-operator-fastpath-const.slangtests/language-feature/operator-overload/builtin-operator-fastpath-float.slangtests/language-feature/operator-overload/builtin-operator-fastpath-glsl.slangtests/language-feature/operator-overload/builtin-operator-fastpath-uint.slang
👮 Files not reviewed due to content moderation or server errors (1)
- source/slang/slang-lower-to-ir.cpp
There was a problem hiding this comment.
Verdict: 🟡 Has issues — 0 bug(s), 3 gap(s)
The PR adds a builtin-operator fast path that bypasses generic operator overload resolution for builtin scalar/vector/matrix operands, including a new decl-free BuiltinOperationIntVal for compile-time-constant folding of fast-path operators. Findings concern a silent fallback in tryConstantFoldDeclRef that masks unrelated fold failures, a default: case in getBuiltinOperationOpText that silently returns "?" (collides via the new KB mangling), and a misleading copy-pasted docstring above getBuiltinArithmeticCommonType.
Changes Overview
Fast-path AST + checking (slang-ast-expr.h, slang-check-expr.cpp, slang-check-impl.h)
- New
OperatorExpr::isLoweredAsBuiltinArithmeticflag;convertToBuiltinArithmeticOprecognizes builtin scalar/vector/matrix operators of the same or mixed builtin type and assigns the result type without enumerating any operator candidate.getBuiltinArithmeticCommonTypereproduces the usual arithmetic conversions, coercing only the element type sovector OP scalarkeeps its mixed shape.isGLSLOperatorScope(set by-allow-glslorimport glsl;) disables the fast path so GLSL-specific semantics (vec == vec→ scalar bool,mat * mat→ matrix product) are still owned by theglslmodule's overloads.
Decl-free constant Val (slang-ast-val.{h,cpp}, slang-mangle.cpp, slang-type-layout.cpp, slang-check-conversion.cpp)
- New
BuiltinOperationIntVal(op kind + type + argVals) so a symbolic fast-path operator likeN*2has one canonical representation across substitution boundaries instead of producing aFuncCallIntValin some paths and a fast-pathValin others._tryFoldConstantShiftis shared betweenFuncCallIntVal::tryFoldImplandBuiltinOperationIntVal::tryFoldImplso shift folding is consistent. Mangling uses a newKBprefix; layout/conversion sites that previously special-casedFuncCallIntValare extended.
Folding plumbing (slang-check-expr.cpp, slang-check-decl.cpp, slang-check-stmt.cpp)
tryConstantFoldExprrecognizes an unresolved-calleeVarExprwith an operator name and folds it directly (also handles re-folding of cross-module deserialized initializers).tryConstantFoldDeclRefadds a fallback that returnsdecl->val->substitute(declRef)when the init-expr fold fails._initExprIsRuntimeValuetreats fast-path nodes as pure.tryInferLoopMaxIterationsis extended to read the comparison op from a fast-path predicate node (placed before the resolved-DeclRef branch sinceVarExpris aDeclRefExpr).
IR lowering (slang-lower-to-ir.cpp)
lowerBuiltinArithmeticOpemits the IR op directly (Add/Sub/Mul/Div,IRemvsFRemby element type,Eql/Less/...,BitAnd/...,Lsh/Rsh,Neg/Not/BitNot); mixed-shape operands flow through unchanged.visitBuiltinOperationIntValmirrorsvisitFuncCallIntVal's constexpr-op dispatch keyed on the operator enum.
Tests (tests/language-feature/operator-overload/builtin-operator-fastpath*.slang, tests/autodiff/diff-loop-builtin-operator.slang)
- Per-operator-family coverage on cpu/vk/d3d12 with runtime operands; const-context test exercising symbolic generic-value folding (
getDouble<N>() { return getN<N*2>(); }); GLSL-scope test pinning matrix-product / scalar-bool semantics; differentiable for-loop test exercising fwd_diff and bwd_diff over a fast-pathedi < 8predicate.
Findings (3 total)
| Severity | Location | Finding |
|---|---|---|
| 🟡 Gap | source/slang/slang-ast-val.cpp:2361 |
getBuiltinOperationOpText default: returns "?" — collides via the new KB mangling when a future enumerator is added |
| 🟡 Gap | source/slang/slang-check-expr.cpp:4718-4720 |
Doc comment above getBuiltinArithmeticCommonType describes the unrelated _getBuiltinNumericShape helper |
| 🟡 Gap | source/slang/slang-check-expr.cpp:2820-2823 |
tryConstantFoldDeclRef silently falls back to decl->val when the init-expr fold fails — applies to all callers, not just the deserialized fast-path case the comment justifies |
| case BuiltinOperationKind::Not: | ||
| return toSlice("!"); | ||
| default: | ||
| return toSlice("?"); |
There was a problem hiding this comment.
🟡 Gap: getBuiltinOperationOpText silently returns "?" for unknown enum values
The default branch returns toSlice("?"):
default:
return toSlice("?");This text is consumed by two callers that need a stable identifier — BuiltinOperationIntVal::_toTextOverride (diagnostics) and the new KB mangling case in slang-mangle.cpp (emitNameImpl(context, getBuiltinOperationOpText(...))). Adding a new BuiltinOperationKind value without extending this switch produces:
- A mangled symbol that uses the literal name
?, which collides with any other enum value also missed by an extension. - A
toTextrepresentation that is indistinguishable from a literal?:ternary printed elsewhere.
Per the comment on BuiltinOperationKind itself ("their integer values are part of the serialized/mangled form, so only append, never reorder"), this enum is treated as ABI surface — silent fallback defeats that intent.
Suggested fix:
default:
SLANG_UNEXPECTED("unhandled BuiltinOperationKind in getBuiltinOperationOpText");
UNREACHABLE_RETURN(toSlice(""));| { | ||
| BaseType leftBase, rightBase; | ||
| IntVal *leftRows, *leftCols, *rightRows, *rightCols; |
There was a problem hiding this comment.
🟡 Gap: Doc comment above getBuiltinArithmeticCommonType describes a different function
The comment immediately above getBuiltinArithmeticCommonType is:
// Decompose a builtin numeric type into (base element type, shape). `outRows`/`outCols`
// describe the shape: both null => scalar, rows set & cols null => vector<rows>, both set =>
// matrix<rows,cols>. Returns false if `type` is not a builtin scalar/vector/matrix.
Type* SemanticsExprVisitor::getBuiltinArithmeticCommonType(Type* left, Type* right)This is the docstring of the file-static helper _getBuiltinNumericShape (it appears verbatim above that helper too). getBuiltinArithmeticCommonType does the opposite — it computes a common type for two operands of (possibly) different types. The accurate doc is in slang-check-impl.h where this function is declared.
Suggested fix: Either delete the implementation-site comment (it's redundant with the header) or replace it with a one-line cross-reference like // See declaration in slang-check-impl.h.
| tryConstantFoldExpr(getInitExpr(m_astBuilder, declRef), kind, &newCircularityInfo)) | ||
| return folded; | ||
| return foldFromStoredVal(); | ||
| } |
There was a problem hiding this comment.
🟡 Gap: Silent behavior change in tryConstantFoldDeclRef for non-fast-path callers
The new fallback applies whenever tryConstantFoldExpr of an init expression fails:
if (auto folded =
tryConstantFoldExpr(getInitExpr(m_astBuilder, declRef), kind, &newCircularityInfo))
return folded;
return foldFromStoredVal();Before this PR, a fold failure on the init expression returned nullptr. After this PR, the function returns whatever was previously stored in decl->val (substituted for this declRef). The justification in the diff comment is specific to deserialized fast-path operators ("an imported initializer that contains a builtin fast-path operator loses its operator-name callee through AST serialization"), but the fallback is unconditional — it also kicks in for any static const whose init expression now fails to fold for an unrelated reason (a checking error, a non-foldable subexpression introduced by a future change, etc.).
In those non-fast-path cases, decl->val was populated by an earlier successful fold (varDecl->val = tryConstantFoldExpr(varDecl->initExpr, ConstantFoldingKind::LinkTime, nullptr) at slang-check-decl.cpp:2708), so the fallback effectively returns a stale value instead of surfacing the new fold failure. Without a test that pins this, a regression in tryConstantFoldExpr could be silently masked here.
Suggestion: Gate the second fallback on detecting that the init expression actually contains a fast-path / unresolved-callee operator, so non-fast-path init expressions retain the previous "fold failure → null" behavior. Alternatively, document the expected interaction between decl->val (link-time fold cache) and tryConstantFoldExpr's rerun and add an assertion that they agree when both succeed.
1. Motivation
Semantic checking is a large fraction of front-end compile time, and within it operator overload
resolution dominates for ordinary code. Every
a + b,i < n,x & mask,-v, etc. is, today,resolved like any other call: the checker enumerates the generic
operator OPcandidate set in thecore module (
operator+<T:IArithmetic>,operator+<T:IFloat>, the comparison/bitwise families, theper-base-type
__intrinsic_opoverloads, …), runs generic-argument inference and typeunification for each candidate, ranks them, and then coerces operands. Profiling
neural.slangshowed this candidate fan-out + inference to be the single largest front-end cost; on operator-dense
shaders it is the majority of
SemanticChecking.But for builtin scalar/vector/matrix operands the answer is fixed: it is exactly the corresponding
builtin IR op (possibly after the usual arithmetic conversion). We are paying full
generic-overload-resolution price to rediscover a fixed answer.
This PR hard-codes that case.
2. Proposed solution
During expression checking, before falling into overload resolution, recognize a builtin operator
applied to builtin scalar/vector/matrix operands and resolve it directly:
convertToBuiltinArithmeticOp(inslang-check-expr.cpp) checks the operator and operand types,assigns the
OperatorExprits result type, and marks itisLoweredAsBuiltinArithmetic— withoutenumerating any operator candidate.
lowerBuiltinArithmeticOp(inslang-lower-to-ir.cpp) reads the operator name + operand type andemits the corresponding IR op directly.
The node stays an
OperatorExpr(it is not rewritten into a function/member call), so everyform-sensitive analysis that walks the checked AST — constant folding, auto-diff, for-loop trip-count
inference — still sees an operator.
Coverage (builtin integer / floating-point / bool scalar, vector, or matrix operands):
+ - * / %< > <= >=== !=& | ^ << >>-(negate),~(bitnot),!(logical-not, bool)This covers three operand shapes, each handled without overload resolution:
old path.
int + float,uint64_t | uint,scalar * vector, …) — a hard-codedcommon-type promotion (§4) reproduces what overload resolution + coercion would have selected;
also byte-identical.
static constinitializers — foldvia a decl-free
BuiltinOperationIntVal(§4), so there is never a secondValrepresentation ofe.g.
N*2.Falls through to normal resolution (behavior unchanged):
&&/||(kept on the short-circuit path),user-defined operator types, and GLSL-compatibility scope —
-allow-glslor animport glsl;,where some operators differ (
vec == vec→ scalarbool;mat * mat→ matrix product). Thosesemantics live in the
glslmodule's operator overloads, which still own them.3. Change summary
slang-ast-expr.hOperatorExpr::isLoweredAsBuiltinArithmeticflag (kept as an OperatorExpr; not a call).Valslang-ast-val.h/.cpp,slang-mangle.cpp,slang-lower-to-ir.cppBuiltinOperationIntVal(op kind + type + args), with fold / substitute / resolve / link-time-resolve / mangling / IR-lowering, so symbolic constant operators (N*2) have a single canonical representation.slang-check-expr.cpp,slang-check-impl.hconvertToBuiltinArithmeticOp(binary + unary eligibility for scalar/vector/matrix);getBuiltinArithmeticCommonType(hard-coded usual-arithmetic-conversions, element-only coercion); GLSL-scope gate (isGLSLOperatorScope=AllowGLSLorglslimported);tryConstantFoldExpr/tryConstantFoldDeclReffold fast-path nodes by op kind.slang-lower-to-ir.cpplowerBuiltinArithmeticOpemits the IR op (Add/Sub/…/Eql/Less/BitAnd/Lsh/Neg/Not/BitNot;IRemvsFRemby element type); mixed-shape operands flow through so the SPIR-V backend still foldsvector * scalar→OpVectorTimesScalar.slang-check-decl.cpp_initExprIsRuntimeValuetreats a fast-path node as pure (runtime only if an operand is).slang-check-stmt.cpptryInferLoopMaxIterationsreads the comparison op from a fast-path predicate node, not just a resolved intrinsic DeclRef.tests/language-feature/operator-overload/builtin-operator-fastpath.slang,tests/autodiff/diff-loop-builtin-operator.slang4. How the changes were derived
The hard part is not the fast path itself — it is keeping every consumer of the checked AST working
when the operator's callee is left unresolved. Three areas required care:
(a) Compile-time-constant contexts →
BuiltinOperationIntVal. Array extents, generic valuearguments, and
static constinitializers must keep folding. The original constant folder needed aresolved-callee
DeclRef. Rather than skip the fast path here (which would leave concrete constantsfolding one way and symbolic constants like
N*2folding to a decl-bearingFuncCallIntVal— tworepresentations of the same value that compare unequal under pointer-identity
Val::equals, breakinggeneric specialization), this PR introduces a decl-free
BuiltinOperationIntVal: it stores theoperation kind + type + argument
Vals and re-evaluates on substitution. Concrete-constant operatorsfold directly; symbolic ones produce a single canonical
BuiltinOperationIntVal. So there is exactlyone representation of
N*2in every context.(b) Mixed-type promotion →
getBuiltinArithmeticCommonType. Fora OP bwith different builtintypes, overload resolution would pick a common type via the usual arithmetic conversions. This is
reproduced as ~60 lines of code logic (no table): float beats int; among floats the larger size wins;
among ints the larger size wins and on a size-tie the unsigned wins; bool promotes; scalar/vector/
matrix broadcast. Each branch was checked against the baseline. Only the element type is coerced;
each operand keeps its own shape, so the IR stays in the mixed
vector OP scalarform that backendsoptimize (SPIR-V
OpVectorTimesScalaris preserved) and the result is byte-identical. The commontype is the wider/no-narrowing one, so the element coercions never narrow → no spurious "implicit
conversion not recommended" warnings (which would otherwise break the warning-fatal core-module
bootstrap).
(c) Differentiable-loop trip-count inference.
tryInferLoopMaxIterationsextracts the comparisonop from a
forpredicate (i < N) via the resolved intrinsicDeclRef. With the predicatefast-pathed the callee is the unresolved operator-name
VarExpr, so inference bailed → no[MaxIters]→ reverse-mode auto-diff failed withE30510. Inference now recognizes a fast-pathpredicate node (checked before the resolved-DeclRef case, since
VarExpris aDeclRefExpr).Matrices are included; their fast-pathed codegen is semantically equivalent but not always
byte-identical to the old per-element inlined form (different anonymous-temp naming), which is fine.
GLSL scope is detected by
AllowGLSLor animport glsl;, since a user can pull in the GLSLoperator semantics without the flag.
5. Correctness / validation
masterbuild (verified onhlslandspirv-asm) forsame-type and mixed-type scalar/vector arithmetic, comparison,
IRem/FRem, bitwise/shift, boolequality, unary,
half/double, scalar↔vector/matrix broadcast (incl.OpVectorTimesScalar), andconstant contexts. Matrices are semantically equivalent.
neuralstandard-module bootstrap.slang-testsweep (32 servers): 0 real regressions vs the clean-masterbaseline(the LLVM-JIT-execution failures present in the run are environmental — the baseline fails the same
set identically).
6. Compiler performance
Measured with
-report-detailed-perf-benchmark(min of repeated runs) on operator-dense kernels,against a clean-
masterbuild (verified to lack the fast path — its semantic checking is 5–6×slower).
Front-end (
SemanticChecking) — the phase this PR targets:No per-stage regression — semantic-checking time, measured at each commit:
BuiltinOperationIntVal)Each step improves a category or holds it flat; the constant-folding work added no checking cost, and
the mixed-type machinery did not regress the shared same-type path.
End-to-end improvement scales with how front-end-bound a workload is. On a deliberately
backend-bound kernel (a long straight-line shader where
linkAndOptimizeIRis ~95% of compile time),the 83% front-end win is only ~4% of the wall clock; on front-end-bound code (operator/overload-heavy
shaders, large module front-ends) it is a much larger fraction.
7. Follow-ups
operatordeclarations was attempted and reverted. Gating theper-base-type concrete operator emissions on the existing generic
OverloadRank(10)fallbackbuilds cleanly and is non-GLSL byte-identical, but the full sweep showed real regressions:
enum == literal, genericT : IArithmeticarithmetic, and cooperative-matrix all broke withE30019 "expected an expression of type 'vector<int,1>'". The concrete scalar operators turn outto be the resolution targets that enum→tag-type, generic-
T, and other implicitly-convertibleoperands match against; the generic operators don't replicate that (resolution falls through to a
vector candidate and fails). Making removal work needs deeper overload-resolution/coercion changes,
so it is left as a separate effort.
$initcoercions sofloat x = intVal;skips$initoverload resolution (separate PR; compounds on this one).