Skip to content

Hard-code a fast path for builtin scalar/vector/matrix operators#11493

Open
csyonghe wants to merge 6 commits into
masterfrom
operator-hardcode
Open

Hard-code a fast path for builtin scalar/vector/matrix operators#11493
csyonghe wants to merge 6 commits into
masterfrom
operator-hardcode

Conversation

@csyonghe
Copy link
Copy Markdown
Collaborator

@csyonghe csyonghe commented Jun 6, 2026

1. Motivation

Semantic checking is a large fraction of front-end compile time, and within it operator overload
resolution dominates
for ordinary code. Every a + b, i < n, x & mask, -v, etc. is, today,
resolved like any other call: the checker enumerates the generic operator OP candidate set in the
core module (operator+<T:IArithmetic>, operator+<T:IFloat>, the comparison/bitwise families, the
per-base-type __intrinsic_op overloads, …), runs generic-argument inference and type
unification
for each candidate, ranks them, and then coerces operands. Profiling neural.slang
showed this candidate fan-out + inference to be the single largest front-end cost; on operator-dense
shaders it is the majority of SemanticChecking.

But for builtin scalar/vector/matrix operands the answer is fixed: it is exactly the corresponding
builtin IR op (possibly after the usual arithmetic conversion). We are paying full
generic-overload-resolution price to rediscover a fixed answer.

This PR hard-codes that case.

2. Proposed solution

During expression checking, before falling into overload resolution, recognize a builtin operator
applied to builtin scalar/vector/matrix operands and resolve it directly:

  • convertToBuiltinArithmeticOp (in slang-check-expr.cpp) checks the operator and operand types,
    assigns the OperatorExpr its result type, and marks it isLoweredAsBuiltinArithmeticwithout
    enumerating any operator candidate
    .
  • lowerBuiltinArithmeticOp (in slang-lower-to-ir.cpp) reads the operator name + operand type and
    emits the corresponding IR op directly.

The node stays an OperatorExpr (it is not rewritten into a function/member call), so every
form-sensitive analysis that walks the checked AST — constant folding, auto-diff, for-loop trip-count
inference — still sees an operator.

Coverage (builtin integer / floating-point / bool scalar, vector, or matrix operands):

Family Operators Element types
Arithmetic + - * / % integer, floating-point
Comparison < > <= >= integer, floating-point
Equality == != integer, floating-point, bool
Bitwise / shift & | ^ << >> integer
Unary - (negate), ~ (bitnot), ! (logical-not, bool) as applicable

This covers three operand shapes, each handled without overload resolution:

  1. Same builtin type on both sides — the result is the builtin IR op; byte-identical to the
    old path.
  2. Mixed builtin types (int + float, uint64_t | uint, scalar * vector, …) — a hard-coded
    common-type promotion (§4) reproduces what overload resolution + coercion would have selected;
    also byte-identical.
  3. Constant operators in array extents / generic value args / static const initializers — fold
    via a decl-free BuiltinOperationIntVal (§4), so there is never a second Val representation of
    e.g. N*2.

Falls through to normal resolution (behavior unchanged): &&/|| (kept on the short-circuit path),
user-defined operator types, and GLSL-compatibility scope-allow-glsl or an import glsl;,
where some operators differ (vec == vec → scalar bool; mat * mat → matrix product). Those
semantics live in the glsl module's operator overloads, which still own them.

3. Change summary

Area Files What
AST slang-ast-expr.h OperatorExpr::isLoweredAsBuiltinArithmetic flag (kept as an OperatorExpr; not a call).
Constant Val slang-ast-val.h/.cpp, slang-mangle.cpp, slang-lower-to-ir.cpp New decl-free BuiltinOperationIntVal (op kind + type + args), with fold / substitute / resolve / link-time-resolve / mangling / IR-lowering, so symbolic constant operators (N*2) have a single canonical representation.
Semantic check slang-check-expr.cpp, slang-check-impl.h convertToBuiltinArithmeticOp (binary + unary eligibility for scalar/vector/matrix); getBuiltinArithmeticCommonType (hard-coded usual-arithmetic-conversions, element-only coercion); GLSL-scope gate (isGLSLOperatorScope = AllowGLSL or glsl imported); tryConstantFoldExpr/tryConstantFoldDeclRef fold fast-path nodes by op kind.
IR lowering slang-lower-to-ir.cpp lowerBuiltinArithmeticOp emits the IR op (Add/Sub/…/Eql/Less/BitAnd/Lsh/Neg/Not/BitNot; IRem vs FRem by element type); mixed-shape operands flow through so the SPIR-V backend still folds vector * scalarOpVectorTimesScalar.
Decl check slang-check-decl.cpp _initExprIsRuntimeValue treats a fast-path node as pure (runtime only if an operand is).
Loop inference slang-check-stmt.cpp tryInferLoopMaxIterations reads the comparison op from a fast-path predicate node, not just a resolved intrinsic DeclRef.
Tests tests/language-feature/operator-overload/builtin-operator-fastpath.slang, tests/autodiff/diff-loop-builtin-operator.slang Operator correctness on runtime operands across cpu/vk/dx12; differentiable-loop regression.

4. How the changes were derived

The hard part is not the fast path itself — it is keeping every consumer of the checked AST working
when the operator's callee is left unresolved. Three areas required care:

(a) Compile-time-constant contexts → BuiltinOperationIntVal. Array extents, generic value
arguments, and static const initializers must keep folding. The original constant folder needed a
resolved-callee DeclRef. Rather than skip the fast path here (which would leave concrete constants
folding one way and symbolic constants like N*2 folding to a decl-bearing FuncCallIntValtwo
representations of the same value that compare unequal under pointer-identity Val::equals, breaking
generic specialization), this PR introduces a decl-free BuiltinOperationIntVal: it stores the
operation kind + type + argument Vals and re-evaluates on substitution. Concrete-constant operators
fold directly; symbolic ones produce a single canonical BuiltinOperationIntVal. So there is exactly
one representation of N*2 in every context.

(b) Mixed-type promotion → getBuiltinArithmeticCommonType. For a OP b with different builtin
types, overload resolution would pick a common type via the usual arithmetic conversions. This is
reproduced as ~60 lines of code logic (no table): float beats int; among floats the larger size wins;
among ints the larger size wins and on a size-tie the unsigned wins; bool promotes; scalar/vector/
matrix broadcast. Each branch was checked against the baseline. Only the element type is coerced;
each operand keeps its own shape, so the IR stays in the mixed vector OP scalar form that backends
optimize (SPIR-V OpVectorTimesScalar is preserved) and the result is byte-identical. The common
type is the wider/no-narrowing one, so the element coercions never narrow → no spurious "implicit
conversion not recommended" warnings (which would otherwise break the warning-fatal core-module
bootstrap).

(c) Differentiable-loop trip-count inference. tryInferLoopMaxIterations extracts the comparison
op from a for predicate (i < N) via the resolved intrinsic DeclRef. With the predicate
fast-pathed the callee is the unresolved operator-name VarExpr, so inference bailed → no
[MaxIters] → reverse-mode auto-diff failed with E30510. Inference now recognizes a fast-path
predicate node (checked before the resolved-DeclRef case, since VarExpr is a DeclRefExpr).

Matrices are included; their fast-pathed codegen is semantically equivalent but not always
byte-identical to the old per-element inlined form (different anonymous-temp naming), which is fine.
GLSL scope is detected by AllowGLSL or an import glsl;, since a user can pull in the GLSL
operator semantics without the flag.

5. Correctness / validation

  • Byte-identical codegen vs a clean-master build (verified on hlsl and spirv-asm) for
    same-type and mixed-type scalar/vector arithmetic, comparison, IRem/FRem, bitwise/shift, bool
    equality, unary, half/double, scalar↔vector/matrix broadcast (incl. OpVectorTimesScalar), and
    constant contexts. Matrices are semantically equivalent.
  • Warning-free core / GLSL / neural standard-module bootstrap.
  • Full slang-test sweep (32 servers): 0 real regressions vs the clean-master baseline
    (the LLVM-JIT-execution failures present in the run are environmental — the baseline fails the same
    set identically).

6. Compiler performance

Measured with -report-detailed-perf-benchmark (min of repeated runs) on operator-dense kernels,
against a clean-master build (verified to lack the fast path — its semantic checking is 5–6×
slower).

Front-end (SemanticChecking) — the phase this PR targets:

Workload master this PR Δ
arithmetic-dense, mixed-type operands 267 ms 44 ms −83%
arithmetic-dense, same-type operands 216 ms 31 ms −86%

No per-stage regression — semantic-checking time, measured at each commit:

Stage mixed-type same-type
master (baseline) 265 ms 216 ms
Stage 1 — same-type fast path 148 ms 32 ms
+ constant folding (BuiltinOperationIntVal) 145 ms 30 ms
+ mixed-type promotion (this PR) 44 ms 31 ms

Each step improves a category or holds it flat; the constant-folding work added no checking cost, and
the mixed-type machinery did not regress the shared same-type path.

End-to-end improvement scales with how front-end-bound a workload is. On a deliberately
backend-bound kernel (a long straight-line shader where linkAndOptimizeIR is ~95% of compile time),
the 83% front-end win is only ~4% of the wall clock; on front-end-bound code (operator/overload-heavy
shaders, large module front-ends) it is a much larger fraction.

7. Follow-ups

  • Removing the redundant core operator declarations was attempted and reverted. Gating the
    per-base-type concrete operator emissions on the existing generic OverloadRank(10) fallback
    builds cleanly and is non-GLSL byte-identical, but the full sweep showed real regressions:
    enum == literal, generic T : IArithmetic arithmetic, and cooperative-matrix all broke with
    E30019 "expected an expression of type 'vector<int,1>'". The concrete scalar operators turn out
    to be the resolution targets that enum→tag-type, generic-T, and other implicitly-convertible
    operands match against; the generic operators don't replicate that (resolution falls through to a
    vector candidate and fails). Making removal work needs deeper overload-resolution/coercion changes,
    so it is left as a separate effort.
  • Hard-coding builtin scalar/vector/matrix $init coercions so float x = intVal; skips $init
    overload resolution (separate PR; compounds on this one).

Resolve builtin same-type arithmetic (+ - * / %), comparison
(== != < > <= >=), bitwise (& | ^ << >>), equality on bool, and unary
(- ! ~) operators directly during semantic checking, instead of
enumerating the generic `operator OP` candidate set and running overload
resolution + generic inference + type coercion.

When both operands of a binary operator (or the operand of a unary
operator) have the *same* builtin scalar/vector/matrix type, no implicit
conversion and no user-defined overload can apply, so the result is
exactly the corresponding builtin IR op. convertToBuiltinArithmeticOp
recognizes these during checking, gives the OperatorExpr its result type
directly, and marks it isLoweredAsBuiltinArithmetic; lowerBuiltinArithmeticOp
emits the IR op. The OperatorExpr node is preserved (not rewritten to a
call) so form-sensitive analyses still see an operator.

Compile-time-constant contexts (array extents, generic value arguments,
static const initializers) are left on the normal resolution + folding
path, because a symbolic constant result (e.g. N/2) folds to a deferred
FuncCallIntVal that needs the resolved operator decl. The constant folder
also folds a fast-path node directly by operator name when its operands
reduce to concrete integers.

The for-loop max-iterations inference is taught to read the comparison
operator from a fast-path predicate node, so loops in [Differentiable]
functions still infer [MaxIters] (previously failed with E30510).

Mixed/promoted operand types, user-defined operator types, && / ||
(short-circuit), and symbolic-constant operators all fall through to
normal resolution, so generated code is byte-identical.

Speeds up SemanticChecking by ~60-67% on operator-dense shaders and ~9%
on neural.slang.
@csyonghe csyonghe requested a review from a team as a code owner June 6, 2026 00:48
@csyonghe csyonghe requested review from bmillsNV and Copilot and removed request for a team and Copilot June 6, 2026 00:48
@csyonghe csyonghe added the pr: non-breaking PRs without breaking changes label Jun 6, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a builtin-operator fast-path: eligible same-type arithmetic/comparison/bitwise/unary operators are rewritten to an OperatorExpr flagged isLoweredAsBuiltinArithmetic, enabling direct IR lowering and integrating constant folding, link-time IntVal forms, loop-iteration inference, and autodiff handling.

Changes

Builtin arithmetic fast-path

Layer / File(s) Summary
AST flag for builtin operator marking
source/slang/slang-ast-expr.h
OperatorExpr gains isLoweredAsBuiltinArithmetic boolean field to mark operators eligible for direct IR lowering instead of overload resolution.
Semantic detection and rewriting of builtin operators
source/slang/slang-check-impl.h, source/slang/slang-check-expr.cpp
New convertToBuiltinArithmeticOp helper detects runtime-value builtin operators, validates operand/result type/shape (including broadcasting/common-type logic), rewrites InvokeExpr into an OperatorExpr flagged for lowering, and is invoked early from visitInvokeExpr.
Constant-folding & BuiltinOperationIntVal
source/slang/slang-ast-val.h, source/slang/slang-ast-val.cpp, source/slang/slang-check-expr.cpp
Introduce BuiltinOperationIntVal and BuiltinOperationKind, implement folding, text, resolve/substitute/link-time behavior, and update tryConstantFoldExpr/tryConstantFoldDeclRef to prefer fast-path operator names and construct BuiltinOperationIntVal for symbolic cases.
Direct IR lowering for builtin operators
source/slang/slang-lower-to-ir.cpp, source/slang/slang-check-expr.cpp
Add visitBuiltinOperationIntVal constexpr lowering, implement lowerBuiltinArithmeticOp(OperatorExpr*), and route OperatorExpr nodes marked isLoweredAsBuiltinArithmetic to this lowering path from visitInvokeExprImpl.
Loop iteration inference & init/runtime handling
source/slang/slang-check-stmt.cpp, source/slang/slang-check-decl.cpp
tryInferLoopMaxIterations gains a builtin-arithmetic fast path mapping unresolved operator VarExpr names to IROp for comparisons; _initExprIsRuntimeValue treats builtin-lowered operator invokes as runtime only if any operand is runtime.
Linkage, mangling, and type-layout updates
source/slang/slang-mangle.cpp, source/slang/slang-check-conversion.cpp, source/slang/slang-type-layout.cpp
emitVal emits a mangled form for BuiltinOperationIntVal; initializer-list handling recognizes BuiltinOperationIntVal as a link-time integer for element/row counts; GetElementCount treats it as invalid layout size.
Regression tests for fast-path and autodiff
tests/language-feature/operator-overload/*, tests/autodiff/diff-loop-builtin-operator.slang
Added tests exercising runtime-dependent builtin operator fast paths across CPU/VK/D3D12 and an autodiff regression verifying loop max-iteration inference for a builtin comparison predicate.

Suggested reviewers

  • bmillsNV
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and directly summarizes the main change: hard-coding a fast path for builtin scalar/vector/matrix operators.
Description check ✅ Passed The description is comprehensive and highly related to the changeset, detailing motivation, solution, implementation areas, correctness validation, and performance improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f772875-fdfb-415d-a3bb-c2b36b5ce104

📥 Commits

Reviewing files that changed from the base of the PR and between 5230a81 and 2850954.

📒 Files selected for processing (7)
  • source/slang/slang-ast-expr.h
  • source/slang/slang-check-expr.cpp
  • source/slang/slang-check-impl.h
  • source/slang/slang-check-stmt.cpp
  • source/slang/slang-lower-to-ir.cpp
  • tests/autodiff/diff-loop-builtin-operator.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath.slang
👮 Files not reviewed due to content moderation or server errors (1)
  • source/slang/slang-lower-to-ir.cpp

Comment thread source/slang/slang-check-expr.cpp Outdated
Comment on lines +4448 to +4455
// Arithmetic over constants (including a previously fast-pathed builtin op) is constant.
if (auto opExpr = as<OperatorExpr>(expr))
{
if (opExpr->arguments.getCount() == 2)
return _isCompileTimeConstantArith(opExpr->arguments[0]) &&
_isCompileTimeConstantArith(opExpr->arguments[1]);
}
return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unary constant arithmetic is not recognized as compile-time-constant here.

_isCompileTimeConstantArith() treats OperatorExpr as constant only when it has 2 args. Unary forms
(e.g. -N, ~N, !B) return false, so constant expressions that include unary operators can be
misclassified and incorrectly routed through the builtin fast path.

Proposed fix
-    if (auto opExpr = as<OperatorExpr>(expr))
-    {
-        if (opExpr->arguments.getCount() == 2)
-            return _isCompileTimeConstantArith(opExpr->arguments[0]) &&
-                   _isCompileTimeConstantArith(opExpr->arguments[1]);
-    }
+    if (auto opExpr = as<OperatorExpr>(expr))
+    {
+        if (opExpr->arguments.getCount() == 1)
+            return _isCompileTimeConstantArith(opExpr->arguments[0]);
+        if (opExpr->arguments.getCount() == 2)
+            return _isCompileTimeConstantArith(opExpr->arguments[0]) &&
+                   _isCompileTimeConstantArith(opExpr->arguments[1]);
+    }

Comment on lines +36 to +43
// Vector operators.
int4 va = int4(a, b, a + b, a - b); // (7,3,10,4)
int4 vb = va * 2 - 1; // (13,5,19,7)
int4 vs = va + vb; // (20,8,29,11)
outputBuffer[14] = vs.x + vs.y; // 28
bool4 cmp = va < vb; // (true,true,true,true)
outputBuffer[15] = (cmp.x && cmp.w) ? 1 : 0; // 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add matrix-operator assertions to match fast-path feature scope.

Line 36 onward validates vector operators, but there is no matrix operator case even though this PR’s fast-path scope includes builtin matrix operators. Please add at least one runtime-dependent matrix arithmetic assertion (and expected CHECK) so matrix lowering regressions are caught by this test.

As per coding guidelines, “Verify test correctness… Ensure new features have corresponding tests.”

Source: Coding guidelines

github-actions[bot]

This comment was marked as outdated.

Two CI regressions from the operator fast path:

- In GLSL-compatibility mode (`-allow-glsl`) several builtin operators have
  different semantics than in Slang/HLSL: `vec == vec` yields a scalar bool
  (all-components-equal) rather than `vector<bool,N>`, and `mat * mat` is a
  matrix product rather than a component-wise multiply. The fast path
  hard-coded HLSL semantics, breaking tests/glsl-intrinsic/* and
  tests/glsl/matrix-mul.slang. Disable the fast path when AllowGLSL is set so
  normal resolution picks the GLSL operator overloads.

- Builtin matrix operators inline to a per-element form whose emitted variable
  naming/structure differs from a single IR matrix op, so fast-pathing matrices
  was not byte-identical. Matrices are rare and give little benefit, so restrict
  the fast path to scalar and vector operands.

With these, the fast path is byte-identical to normal resolution on all
targets.
Copilot AI review requested due to automatic review settings June 6, 2026 03:18
@csyonghe csyonghe changed the title Hard-code a fast path for builtin scalar/vector/matrix operators Hard-code a fast path for builtin scalar/vector operators Jun 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3fcd91f8-ba1a-4b23-aaa8-7cfbe4ed287d

📥 Commits

Reviewing files that changed from the base of the PR and between 2850954 and 63602f5.

📒 Files selected for processing (1)
  • source/slang/slang-check-expr.cpp

Comment on lines +4476 to +4482
// In GLSL-compatibility mode several builtin operators have different semantics than in
// Slang/HLSL — e.g. `vec == vec` yields a scalar `bool` (all-components-equal) rather
// than a `vector<bool,N>`, and `mat * mat` is a matrix product rather than a
// component-wise multiply. Those differences live in the GLSL module's operator
// overloads, so let normal resolution pick them rather than hard-coding HLSL semantics.
if (getOptionSet().getBoolOption(CompilerOptionName::AllowGLSL))
return nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add regression coverage for the new GLSL/matrix bail-outs.

These guards change which operator expressions bypass overload resolution, but the provided fast-path coverage only exercises runtime scalar/vector operators. Please add .slang regressions that prove -allow-glsl still routes cases like vec == vec and mat * mat through the GLSL overloads, and that ordinary matrix operators stay off this fast path; otherwise a later cleanup can silently re-enable the wrong semantics.

As per coding guidelines, "Include tests: Add regression tests as .slang files under tests/."

Also applies to: 4506-4507, 4600-4606

Source: Coding guidelines

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a compiler fast path for the very common case of builtin scalar/vector operators where both operands already have the same builtin type, avoiding generic operator overload candidate enumeration and inference during semantic checking. The operator remains an OperatorExpr in the checked AST (instead of being rewritten into a call), and IR lowering emits the corresponding IR op directly when the fast path is enabled.

Changes:

  • Add OperatorExpr::isLoweredAsBuiltinArithmetic to mark eligible operators for direct builtin lowering.
  • Implement convertToBuiltinArithmeticOp() in semantic checking to recognize same-type builtin operators (with carve-outs for constant-eval contexts and GLSL-compat mode) and assign result types without overload resolution.
  • Implement lowerBuiltinArithmeticOp() in IR lowering to emit the matching IR instruction, and update loop max-iteration inference + constant folding to handle fast-pathed operator nodes; add regression tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
source/slang/slang-ast-expr.h Adds an OperatorExpr flag to mark nodes that should lower via the builtin fast path.
source/slang/slang-check-impl.h Declares the semantic-checking fast-path hook for builtin operators.
source/slang/slang-check-expr.cpp Implements fast-path recognition + typing, and updates constant folding to handle unresolved operator-name callees.
source/slang/slang-lower-to-ir.cpp Lowers marked operator expressions directly to the corresponding IR ops.
source/slang/slang-check-stmt.cpp Updates loop trip-count inference to recognize fast-path predicate nodes.
tests/language-feature/operator-overload/builtin-operator-fastpath.slang Adds a runtime-operand regression test to exercise fast-pathed builtin operators.
tests/autodiff/diff-loop-builtin-operator.slang Adds an autodiff regression test for [Differentiable] loops with fast-pathed comparison predicates.

Comment thread source/slang/slang-lower-to-ir.cpp Outdated
BaseTypeInfo::Flag::FloatingPoint) != 0;
}

auto opText = getText(as<VarExpr>(expr->functionExpr)->name);
Comment thread source/slang/slang-lower-to-ir.cpp Outdated
Comment on lines +5131 to +5134
// Emit a builtin floating-point scalar/vector arithmetic operator directly as the
// corresponding IR arithmetic instruction, bypassing callable resolution/lowering.
// The operands and result type are already checked; differentiability is handled by
// autodiff's rules for the emitted IR ops.
Comment thread source/slang/slang-lower-to-ir.cpp Outdated
Comment on lines +5227 to +5229
// Builtin same-type arithmetic/comparison on scalar/vector/matrix operands,
// recognized during checking (see `convertToBuiltinArithmeticOp`): emit the IR
// op directly, skipping the resolved-callable lowering path.
Comment thread source/slang/slang-check-impl.h Outdated
Comment on lines +3660 to +3663
// If `expr` is an arithmetic (`+ - * / %`) or comparison (`== != < > <= >=`) operator
// on two operands of the *same* builtin integer/floating-point scalar, vector, or
// matrix type (comparison: scalar/vector only), with at least one runtime operand and
// outside a constant-evaluation context, mark it for direct builtin IR lowering and
Comment thread source/slang/slang-check-expr.cpp Outdated
Comment on lines +4732 to +4734
// Fast path: builtin same-type arithmetic/comparison on scalar/vector/matrix operands
// (`a + b`, `a < b`, etc.) is marked for direct IR lowering and skips generic operator
// overload resolution.
Comment thread source/slang/slang-ast-expr.h Outdated
Comment on lines +280 to +282
// When set, this builtin same-type arithmetic/comparison operator on scalar/vector/
// matrix operands was recognized during checking and given its result type directly,
// without resolving to a generic `operator OP` candidate. lower-to-ir emits the
Comment on lines +2 to +5
// (convertToBuiltinArithmeticOp / lowerBuiltinArithmeticOp). All operators below act on
// runtime values (read from the buffer) so they are genuinely fast-pathed (not folded),
// exercising arithmetic, comparison, bitwise, equality, and unary operators on scalar and
// vector operands.
Comment on lines +762 to +765
else if (opText == "==")
compareOp = kIROp_Eql;
else if (opText == "!=")
compareOp = kIROp_Neq;
int a = outputBuffer[0] + 7; // 7
int b = outputBuffer[1] + 3; // 3

outputBuffer[0] = a + b; // 10
github-actions[bot]

This comment was marked as outdated.

csyonghe added 3 commits June 5, 2026 20:38
…cope by import

Revisiting the fast-path exclusions:

- Matrices are handled again (arithmetic, bitwise, comparison -> matrix<bool>,
  and unary). The emitted code is not byte-identical to the old inlined-operator
  form (variable naming/structure differs) but is semantically identical, which
  is sufficient.

- GLSL operator semantics are now detected by either `-allow-glsl` *or* the
  `glsl` module being in scope (`isGLSLOperatorScope`). A user can `import glsl;`
  without `-allow-glsl` and pick up its `operator*` overloads that make
  `mat * mat` a matrix product; the fast path must defer to normal resolution in
  that case too. (`vec == vec` -> scalar bool is still keyed on `-allow-glsl`.)
…IntVal

Stop excluding constant-evaluation contexts from the builtin-operator fast path.
A symbolic constant operator (e.g. `N / 2` for a generic value parameter) no
longer needs a resolved operator decl to fold: introduce a decl-free
`BuiltinOperationIntVal` (operator identified by a `BuiltinOperationKind` enum
plus operand `IntVal`s) that folds, substitutes, resolves, mangles, serializes,
and lowers to IR (`visitBuiltinOperationIntVal` -> `emitConstexpr*`). The
constant folder builds it for fast-path operator nodes; same-type builtin
operators are therefore always fast-pathed, so a decl-based `FuncCallIntVal`
never forms for them and there is a single representation (no Val divergence).

Both checking gates are removed (the context gate and the both-operands-constant
gate); `_isCompileTimeConstantArith` is no longer needed.

Fixes found while removing the gates:
- IR value lowering had no case for the new IntVal (`visitBuiltinOperationIntVal`).
- `_initExprIsRuntimeValue` treated a fast-path operator (unresolved callee) as a
  runtime value, wrongly rejecting `static const` global initializers (E31226).
- AST serialization drops the operator-name callee of a fast-path node, so a
  cross-module imported initializer can no longer be re-folded; `tryConstantFoldExpr`
  now recognizes the unresolved-callee form by structure, and `tryConstantFoldDeclRef`
  falls back to the decl's stored, serializable `val` (which round-trips).

Constant-context generics (`vector<T, N/2>`, `N*2+1`, …) and the runtime corpus
are byte-identical to before; the cross-module type-traits test passes.
…motion

Extend the builtin-operator fast path to handle binary operators whose
operands have different builtin scalar/vector/matrix types (e.g. `int + float`,
`uint64_t | uint`, `scalar * vector`). Previously only same-type operands were
fast-pathed; mixed-type operands fell back to generic operator overload
resolution.

`getBuiltinArithmeticCommonType` computes the type overload resolution would
converge on, following the usual arithmetic conversions implemented as code
logic rather than a lookup table:
  - float beats int; among floats the larger size wins;
  - among ints the larger size wins, and on a size tie the unsigned type wins;
  - bool promotes to the other operand;
  - scalar/vector/matrix shapes broadcast (matching extents required for
    vector-vector / matrix-matrix).
This reproduces overload resolution exactly: the common element type is the
wider / no-narrowing one, so the operand coercions never narrow and thus never
emit the "implicit conversion not recommended" warning that would otherwise
break the warning-fatal core module bootstrap.

Only the element type is coerced; each operand keeps its own shape so the IR
stays in the mixed vector/scalar (or matrix/scalar) form that backends
optimize -- e.g. `vector * scalar` remains a two-shape `mul`, which SPIR-V
still lowers to OpVectorTimesScalar rather than a splat + component-wise
multiply. Shape broadcast is handled entirely in lowering/emit, so the common
case (same element type, different shape, such as `v * f`) needs no coerce
call at all.

Codegen for mixed-type operators is byte-identical to the previous
overload-resolution path (verified on HLSL and SPIR-V); validated by the
warning-free core module bootstrap and a full slang-test sweep with no real
regressions.
@csyonghe csyonghe changed the title Hard-code a fast path for builtin scalar/vector operators Hard-code a fast path for builtin scalar/vector/matrix operators Jun 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee99b04f-59d7-4240-bfc0-abcc77c0d3bd

📥 Commits

Reviewing files that changed from the base of the PR and between 63602f5 and 6754108.

📒 Files selected for processing (9)
  • source/slang/slang-ast-val.cpp
  • source/slang/slang-ast-val.h
  • source/slang/slang-check-conversion.cpp
  • source/slang/slang-check-decl.cpp
  • source/slang/slang-check-expr.cpp
  • source/slang/slang-check-impl.h
  • source/slang/slang-lower-to-ir.cpp
  • source/slang/slang-mangle.cpp
  • source/slang/slang-type-layout.cpp
👮 Files not reviewed due to content moderation or server errors (1)
  • source/slang/slang-lower-to-ir.cpp

Comment thread source/slang/slang-ast-val.cpp
Comment on lines +4431 to +4440
bool SemanticsExprVisitor::isGLSLOperatorScope()
{
if (getOptionSet().getBoolOption(CompilerOptionName::AllowGLSL))
return true;
for (auto moduleDecl : getShared()->importedModulesList)
{
if (moduleDecl->getName() && getText(moduleDecl->getName()) == "glsl")
return true;
}
return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n -C3 '\bimportedModulesList\b'

Repository: shader-slang/slang

Length of output: 3316


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show isGLSLOperatorScope() definition and nearby usage
sed -n '4390,4485p' source/slang/slang-check-expr.cpp

# 2) Show convertToBuiltinArithmeticOp() and where it calls isGLSLOperatorScope()
rg -n 'convertToBuiltinArithmeticOp|isGLSLOperatorScope\(' source/slang/slang-check-expr.cpp -n

# Print the definition + call sites with some context
rg -n -C3 'convertToBuiltinArithmeticOp\(' source/slang/slang-check-expr.cpp
rg -n -C3 'isGLSLOperatorScope\(' source/slang/slang-check-expr.cpp

# 3) Inspect SharedSemanticsContext fields for importedModulesList and glslModuleDecl
sed -n '700,820p' source/slang/slang-check-impl.h

# 4) Inspect where importedModulesList is populated (import handling) and comment about scope
sed -n '15980,16120p' source/slang/slang-check-decl.cpp
sed -n '16480,16540p' source/slang/slang-check-decl.cpp

Repository: shader-slang/slang

Length of output: 17307


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all construction sites / aliases for SharedSemanticsContext
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h source/slang/slang-check*.cpp

# Find where isGLSLOperatorScope is called (it’s only in convertToBuiltinArithmeticOp per earlier rg,
# but check for other uses)
rg -n '\bisGLSLOperatorScope\s*\(' source/slang/slang-check-*.cpp

# Check convertToBuiltinArithmeticOp caller context near line 4500-4520
sed -n '4485,4545p' source/slang/slang-check-expr.cpp

# Check how primary-module vs m_module==nullptr modes affect importedModulesList usage
rg -n 'm_module\s*\)' source/slang/slang-check-decl.cpp

Repository: shader-slang/slang

Length of output: 3286


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show SharedSemanticsContext construction at slang-check-impl.h around line 783
sed -n '740,860p' source/slang/slang-check-impl.h

# Also locate other mentions of SharedSemanticsContext creation in check-*.cpp/h
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h

# For each call site, print small surrounding region to understand lifetime parameters (module vs nullptr)
# We'll capture up to 40 lines around each match.
rg -n 'SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h | cut -d: -f1,2 | head

Repository: shader-slang/slang

Length of output: 4859


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find getShared() definition for SemanticsExprVisitor / SemanticsVisitor
rg -n 'getShared\(\)' source/slang/slang-check-*.cpp source/slang/slang-check-*.h

# 2) Find construction/allocation of SharedSemanticsContext
rg -n '(new\s+SharedSemanticsContext|make_shared\s*<\s*SharedSemanticsContext|makeShared\s*<\s*SharedSemanticsContext|SharedSemanticsContext\s*\{)' source/slang/slang-check-*.cpp source/slang/slang-check-*.h

# 3) Also look for places that store SharedSemanticsContext in a member named m_shared or similar
rg -n 'SharedSemanticsContext\*|RefPtr<SharedSemanticsContext>|m_shared' source/slang/slang-check-*.cpp source/slang/slang-check-*.h

Repository: shader-slang/slang

Length of output: 8950


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Direct instantiations
rg -n 'new\s+SharedSemanticsContext|make_shared\s*<\s*SharedSemanticsContext|SharedSemanticsContext\s*\(' source/slang/slang-check-*.cpp source/slang/slang-check-*.h

# Also find if SemanticsContext stores a SharedSemanticsContext
rg -n 'class\s+SemanticsContext|struct\s+SemanticsContext|SemanticsContext::' source/slang/slang-check-*.h source/slang/slang-check-*.cpp

# Find SemanticsExprVisitor/SubVisitor constructions that pass SemanticsContext/shared
rg -n 'SemanticsVisitor\s*\(|SemanticsExprVisitor\s*\(|SemanticsContext\s*\(' source/slang/slang-check-expr.cpp source/slang/slang-check-decl.cpp source/slang/slang-check-*.cpp

Repository: shader-slang/slang

Length of output: 6043


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect SharedSemanticsContext creation sites
sed -n '1,120p' source/slang/slang-check-type.cpp
sed -n '2860,3245p' source/slang/slang-check-shader.cpp

# Also inspect SemanticsContext wrapper around SharedSemanticsContext (to see lifetime passed by reference/pointer)
sed -n '1040,1125p' source/slang/slang-check-impl.h

# Inspect SemanticsVisitor/ExprVisitor creation path for expression checking
# (focus on where a SemanticsExprVisitor gets its shared context)
rg -n 'SemanticsExprVisitor\(' source/slang/slang-check-expr.cpp
rg -n 'SemanticsVisitor\(' source/slang/slang-check-expr.cpp | head -n 50

# Show area around SemanticsExprVisitor construction/usage near the isGLSLOperatorScope call
sed -n '3400,3720p' source/slang/slang-check-expr.cpp

Repository: shader-slang/slang

Length of output: 20695


Use cached GLSL module marker in isGLSLOperatorScope (avoid per-call scan)

isGLSLOperatorScope() currently iterates getShared()->importedModulesList on the hot operator path; however SharedSemanticsContext already caches the imported GLSL module in glslModuleDecl during importModuleIntoScope(). Replace the list scan with a getShared()->glslModuleDecl != nullptr check (while keeping the AllowGLSL option early return). This check is scoped to the current SharedSemanticsContext (a semantics-checking session), not global compilation state, so the guard won’t be broadened across unrelated sessions—but it can still be made constant-time.

github-actions[bot]

This comment was marked as outdated.

…verage

Code review fixes:
- BuiltinOperationIntVal::tryFoldImpl now validates argument arity against the
  operation (unary ops take 1 arg, binary take 2) before reading operands, so a
  malformed node can neither crash on zero args nor mis-fold a binary op with one.
- Harmonize constant shift folding across the two Val representations. A shared
  `_tryFoldConstantShift` helper (negative count -> symbolic; out-of-range count
  masked to the operand width) is now used by both FuncCallIntVal::tryFoldImpl and
  BuiltinOperationIntVal::tryFoldImpl, so the same `x << y` folds to the same
  constant on whichever path reaches it (and the previous raw-shift UB in
  FuncCallIntVal is gone).
- lowerBuiltinArithmeticOp asserts the fast-path callee is the operator-name
  VarExpr it relies on (and reads it through a local) instead of an unchecked
  dereference.
- Update stale comments that predated mixed-type support and matrix re-inclusion
  (ast-expr.h flag, convertToBuiltinArithmeticOp/lowerBuiltinArithmeticOp/
  visitInvokeExpr): the fast path covers same- and mixed-type scalar/vector/matrix
  operands across arithmetic/comparison/bitwise/shift/unary, and folds constants
  via BuiltinOperationIntVal. Clarify why tryInferLoopMaxIterations maps `==`/`!=`.

Test coverage (new .slang regressions; runtime operands so the ops are
genuinely fast-pathed):
- builtin-operator-fastpath-float: float arithmetic + FRem, float comparison,
  float matrix arithmetic, vector*scalar and matrix+scalar broadcast, and
  mixed-type promotion (int+float, half+float, int3+float3).
- builtin-operator-fastpath-uint: unsigned modulo and *logical* right shift on a
  high-bit-set value, unsigned compare, and unsigned vector ops.
- builtin-operator-fastpath-bool: bool `==`/`!=` and unary `!` on scalar and
  vector bool.
- builtin-operator-fastpath-const: compile-time-constant folding in static const
  and generic value-argument contexts, including the shift fold and symbolic
  `N * 2` that re-evaluates on substitution.
- builtin-operator-fastpath-glsl: pins GLSL-scope semantics (mat*mat product,
  vec==vec scalar bool) so the fast path staying disabled there is enforced.
- diff-loop-builtin-operator: add the forward-mode autodiff path alongside reverse.
@csyonghe
Copy link
Copy Markdown
Collaborator Author

csyonghe commented Jun 7, 2026

Thanks for the thorough review. Addressed in f4bfd88. Summary of how each item was handled:

Code

  • BuiltinOperationIntVal::tryFoldImpl arity (CodeRabbit): now validates the argument count against the operation's arity (unary ops take 1, binary take 2) before reading operands, so a malformed node can't crash on zero args or mis-fold a binary op with one.
  • Shift folding divergence from FuncCallIntVal (build-bot): extracted a shared _tryFoldConstantShift helper (negative count → stays symbolic; out-of-range count masked to the operand width) and routed both FuncCallIntVal::tryFoldImpl and BuiltinOperationIntVal::tryFoldImpl through it. The same x << y now folds to the same constant on whichever path reaches it, and the previous raw-shift UB in FuncCallIntVal is gone.
  • Unchecked VarExpr deref in lowerBuiltinArithmeticOp (Copilot): the fast-path callee invariant is now asserted and read through a local instead of an unchecked as<VarExpr>(...)->name.
  • _isCompileTimeConstantArith unary case (CodeRabbit): that function was removed in the constant-folding redesign (22d4b7f) — constants no longer gate the fast path; they fold through the decl-free BuiltinOperationIntVal, which handles unary and binary uniformly. So this no longer applies.
  • ==/!= mapping in tryInferLoopMaxIterations (Copilot): kept (so this branch produces the same compareOp set as the resolved-DeclRef branch, which the shared trip-count logic then filters), with a clarifying comment.
  • Stale comments (Copilot, several): these predated two later commits — mixed-type support (6754108) and matrix re-inclusion (674ca50). Updated OperatorExpr::isLoweredAsBuiltinArithmetic, convertToBuiltinArithmeticOp, lowerBuiltinArithmeticOp, and the visitInvokeExpr site to state the actual contract: same- and mixed-type scalar/vector/matrix operands across arithmetic/comparison/bitwise/shift/unary, with constants folded via BuiltinOperationIntVal.

Note on the "matrices are excluded" comments: that was accurate at the first reviewed commit (2850954), but matrices were re-included in 674ca50, so the comments — not the code — were the stale part.

Test coverage

Added regression tests under tests/language-feature/operator-overload/ (all use runtime buffer operands so the operators are genuinely fast-pathed, not folded):

  • -float: float arithmetic + FRem (%), float comparison, float matrix arithmetic, vector*scalar and matrix+scalar broadcast (so SPIR-V OpVectorTimesScalar stays exercised), and mixed-type promotion (int+float, half+float, int3+float3).
  • -uint: unsigned modulo and logical right shift on a high-bit-set value (0xFFFFFFFF >> 1), unsigned compare, and unsigned vector ops — confirming unsigned operands are eligible and round-trip with the right emitter op.
  • -bool: bool ==/!= and unary ! on scalar and vector bool.
  • -const: compile-time-constant folding in static const and generic value-argument contexts, including the shift fold and a symbolic N * 2 that re-evaluates on substitution.
  • -glsl: pins the GLSL-scope semantics (mat*mat product, vec==vec scalar bool) so a future change that re-enabled the fast path under GLSL scope would be caught.
  • diff-loop-builtin-operator: added the forward-mode autodiff path alongside the existing reverse-mode one.

Validation

All new tests pass on cpu / vk / dx12 / interpreter. A full slang-test sweep shows no real regressions — in particular the shared FuncCallIntVal shift change is byte-identical for every in-range shift count.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
source/slang/slang-check-expr.cpp (1)

4596-4671: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Defer mixed-type coercions until the fast path is known to apply.

Lines 4624-4638 rewrite expr->arguments before Lines 4664-4671 verify that the promoted base type is valid for this operator family. For a mixed-type bitwise case like intVal & floatVal, this path inserts an int -> float cast, then returns nullptr because bitwise ops require an integer base. visitInvokeExpr() then falls back to normal resolution on the already-mutated (float, float) arguments, so downstream overload selection/diagnostics no longer see the user's original expression.

Suggested fix
         Type* commonType = getBuiltinArithmeticCommonType(leftArg->type.type, rightArg->type.type);
         if (!commonType)
             return nullptr;
         BaseType commonBase;
         IntVal *cRows, *cCols;
-        _getBuiltinNumericShape(commonType, commonBase, cRows, cCols);
+        if (!_getBuiltinNumericShape(commonType, commonBase, cRows, cCols))
+            return nullptr;
+        auto commonFlags = BaseTypeInfo::getInfo(commonBase).flags;
+        bool commonIsInteger = (commonFlags & BaseTypeInfo::Flag::Integer) != 0;
+        bool commonIsFloat = (commonFlags & BaseTypeInfo::Flag::FloatingPoint) != 0;
+        bool commonIsBool = (commonBase == BaseType::Bool);
+        bool commonIsEquality = (opText == "==" || opText == "!=");
+        bool commonEligible = isBitwise
+            ? commonIsInteger
+            : (commonIsEquality ? (commonIsInteger || commonIsFloat || commonIsBool)
+                                : (commonIsInteger || commonIsFloat));
+        if (!commonEligible)
+            return nullptr;
         Type* commonElementType = m_astBuilder->getBuiltinType(commonBase);

Then keep the existing coercion/write-back block below that guard, or stage the coerced operands in locals and only assign them back to expr->arguments after the fast path is definitely accepted.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5c065588-decc-4c23-b987-1947df98d98c

📥 Commits

Reviewing files that changed from the base of the PR and between 6754108 and f4bfd88.

📒 Files selected for processing (12)
  • source/slang/slang-ast-expr.h
  • source/slang/slang-ast-val.cpp
  • source/slang/slang-check-expr.cpp
  • source/slang/slang-check-impl.h
  • source/slang/slang-check-stmt.cpp
  • source/slang/slang-lower-to-ir.cpp
  • tests/autodiff/diff-loop-builtin-operator.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath-bool.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath-const.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath-float.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath-glsl.slang
  • tests/language-feature/operator-overload/builtin-operator-fastpath-uint.slang
👮 Files not reviewed due to content moderation or server errors (1)
  • source/slang/slang-lower-to-ir.cpp

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: 🟡 Has issues — 0 bug(s), 3 gap(s)

The PR adds a builtin-operator fast path that bypasses generic operator overload resolution for builtin scalar/vector/matrix operands, including a new decl-free BuiltinOperationIntVal for compile-time-constant folding of fast-path operators. Findings concern a silent fallback in tryConstantFoldDeclRef that masks unrelated fold failures, a default: case in getBuiltinOperationOpText that silently returns "?" (collides via the new KB mangling), and a misleading copy-pasted docstring above getBuiltinArithmeticCommonType.

Changes Overview

Fast-path AST + checking (slang-ast-expr.h, slang-check-expr.cpp, slang-check-impl.h)

  • New OperatorExpr::isLoweredAsBuiltinArithmetic flag; convertToBuiltinArithmeticOp recognizes builtin scalar/vector/matrix operators of the same or mixed builtin type and assigns the result type without enumerating any operator candidate. getBuiltinArithmeticCommonType reproduces the usual arithmetic conversions, coercing only the element type so vector OP scalar keeps its mixed shape. isGLSLOperatorScope (set by -allow-glsl or import glsl;) disables the fast path so GLSL-specific semantics (vec == vec → scalar bool, mat * mat → matrix product) are still owned by the glsl module's overloads.

Decl-free constant Val (slang-ast-val.{h,cpp}, slang-mangle.cpp, slang-type-layout.cpp, slang-check-conversion.cpp)

  • New BuiltinOperationIntVal (op kind + type + arg Vals) so a symbolic fast-path operator like N*2 has one canonical representation across substitution boundaries instead of producing a FuncCallIntVal in some paths and a fast-path Val in others. _tryFoldConstantShift is shared between FuncCallIntVal::tryFoldImpl and BuiltinOperationIntVal::tryFoldImpl so shift folding is consistent. Mangling uses a new KB prefix; layout/conversion sites that previously special-cased FuncCallIntVal are extended.

Folding plumbing (slang-check-expr.cpp, slang-check-decl.cpp, slang-check-stmt.cpp)

  • tryConstantFoldExpr recognizes an unresolved-callee VarExpr with an operator name and folds it directly (also handles re-folding of cross-module deserialized initializers). tryConstantFoldDeclRef adds a fallback that returns decl->val->substitute(declRef) when the init-expr fold fails. _initExprIsRuntimeValue treats fast-path nodes as pure. tryInferLoopMaxIterations is extended to read the comparison op from a fast-path predicate node (placed before the resolved-DeclRef branch since VarExpr is a DeclRefExpr).

IR lowering (slang-lower-to-ir.cpp)

  • lowerBuiltinArithmeticOp emits the IR op directly (Add/Sub/Mul/Div, IRem vs FRem by element type, Eql/Less/..., BitAnd/..., Lsh/Rsh, Neg/Not/BitNot); mixed-shape operands flow through unchanged. visitBuiltinOperationIntVal mirrors visitFuncCallIntVal's constexpr-op dispatch keyed on the operator enum.

Tests (tests/language-feature/operator-overload/builtin-operator-fastpath*.slang, tests/autodiff/diff-loop-builtin-operator.slang)

  • Per-operator-family coverage on cpu/vk/d3d12 with runtime operands; const-context test exercising symbolic generic-value folding (getDouble<N>() { return getN<N*2>(); }); GLSL-scope test pinning matrix-product / scalar-bool semantics; differentiable for-loop test exercising fwd_diff and bwd_diff over a fast-pathed i < 8 predicate.
Findings (3 total)
Severity Location Finding
🟡 Gap source/slang/slang-ast-val.cpp:2361 getBuiltinOperationOpText default: returns "?" — collides via the new KB mangling when a future enumerator is added
🟡 Gap source/slang/slang-check-expr.cpp:4718-4720 Doc comment above getBuiltinArithmeticCommonType describes the unrelated _getBuiltinNumericShape helper
🟡 Gap source/slang/slang-check-expr.cpp:2820-2823 tryConstantFoldDeclRef silently falls back to decl->val when the init-expr fold fails — applies to all callers, not just the deserialized fast-path case the comment justifies

case BuiltinOperationKind::Not:
return toSlice("!");
default:
return toSlice("?");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: getBuiltinOperationOpText silently returns "?" for unknown enum values

The default branch returns toSlice("?"):

default:
    return toSlice("?");

This text is consumed by two callers that need a stable identifier — BuiltinOperationIntVal::_toTextOverride (diagnostics) and the new KB mangling case in slang-mangle.cpp (emitNameImpl(context, getBuiltinOperationOpText(...))). Adding a new BuiltinOperationKind value without extending this switch produces:

  1. A mangled symbol that uses the literal name ?, which collides with any other enum value also missed by an extension.
  2. A toText representation that is indistinguishable from a literal ?: ternary printed elsewhere.

Per the comment on BuiltinOperationKind itself ("their integer values are part of the serialized/mangled form, so only append, never reorder"), this enum is treated as ABI surface — silent fallback defeats that intent.

Suggested fix:

default:
    SLANG_UNEXPECTED("unhandled BuiltinOperationKind in getBuiltinOperationOpText");
    UNREACHABLE_RETURN(toSlice(""));

Comment on lines +4718 to +4720
{
BaseType leftBase, rightBase;
IntVal *leftRows, *leftCols, *rightRows, *rightCols;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: Doc comment above getBuiltinArithmeticCommonType describes a different function

The comment immediately above getBuiltinArithmeticCommonType is:

// Decompose a builtin numeric type into (base element type, shape). `outRows`/`outCols`
// describe the shape: both null => scalar, rows set & cols null => vector<rows>, both set =>
// matrix<rows,cols>. Returns false if `type` is not a builtin scalar/vector/matrix.
Type* SemanticsExprVisitor::getBuiltinArithmeticCommonType(Type* left, Type* right)

This is the docstring of the file-static helper _getBuiltinNumericShape (it appears verbatim above that helper too). getBuiltinArithmeticCommonType does the opposite — it computes a common type for two operands of (possibly) different types. The accurate doc is in slang-check-impl.h where this function is declared.

Suggested fix: Either delete the implementation-site comment (it's redundant with the header) or replace it with a one-line cross-reference like // See declaration in slang-check-impl.h.

Comment on lines +2820 to 2823
tryConstantFoldExpr(getInitExpr(m_astBuilder, declRef), kind, &newCircularityInfo))
return folded;
return foldFromStoredVal();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: Silent behavior change in tryConstantFoldDeclRef for non-fast-path callers

The new fallback applies whenever tryConstantFoldExpr of an init expression fails:

if (auto folded =
        tryConstantFoldExpr(getInitExpr(m_astBuilder, declRef), kind, &newCircularityInfo))
    return folded;
return foldFromStoredVal();

Before this PR, a fold failure on the init expression returned nullptr. After this PR, the function returns whatever was previously stored in decl->val (substituted for this declRef). The justification in the diff comment is specific to deserialized fast-path operators ("an imported initializer that contains a builtin fast-path operator loses its operator-name callee through AST serialization"), but the fallback is unconditional — it also kicks in for any static const whose init expression now fails to fold for an unrelated reason (a checking error, a non-foldable subexpression introduced by a future change, etc.).

In those non-fast-path cases, decl->val was populated by an earlier successful fold (varDecl->val = tryConstantFoldExpr(varDecl->initExpr, ConstantFoldingKind::LinkTime, nullptr) at slang-check-decl.cpp:2708), so the fallback effectively returns a stale value instead of surfacing the new fold failure. Without a test that pins this, a regression in tryConstantFoldExpr could be silently masked here.

Suggestion: Gate the second fallback on detecting that the init expression actually contains a fast-path / unresolved-callee operator, so non-fast-path init expressions retain the previous "fold failure → null" behavior. Alternatively, document the expected interaction between decl->val (link-time fold cache) and tryConstantFoldExpr's rerun and add an assertion that they agree when both succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants