Issue Description
Compiling code that dispatches an interface with N implementations through a runtime-typed
existential takes time quadratic in N during specialization. The specializeModule phase
(visible via -report-perf-benchmark) grows roughly 4× each time N doubles. This regressed
around release v2026.7 and had not recovered to the v2026.5 baseline.
Tracking benchmark graph https://shader-slang.org/slang-compile-perf/workloads/dynamic_dispatch.html.
Reproducer Code
An interface with several implementations selected at runtime (so static specialization cannot
collapse the dispatch). The shape below compiles as-is; the quadratic shows up as the number of
implementations grows (the tools/compile-perf suite generates exactly this as its
dynamic_dispatch workload, scaled by N).
RWStructuredBuffer<float> outBuf;
[anyValueSize(16)]
interface IShape { float eval(float x); }
struct S0 : IShape { float eval(float x) { return x * 1.0 + sin(x); } }
struct S1 : IShape { float eval(float x) { return x * 2.0 + sin(x * 2.0); } }
struct S2 : IShape { float eval(float x) { return x * 3.0 + sin(x * 3.0); } }
struct S3 : IShape { float eval(float x) { return x * 4.0 + sin(x * 4.0); } }
// ... scale this up to N implementations ...
float dispatch(int id, float x)
{
IShape s;
switch (id)
{
case 0: s = S0(); break;
case 1: s = S1(); break;
case 2: s = S2(); break;
case 3: s = S3(); break;
default: s = S0(); break;
}
return s.eval(x);
}
[shader("compute")]
[numthreads(1, 1, 1)]
void computeMain(uint3 tid : SV_DispatchThreadID)
{
float acc = 0.0;
for (int i = 0; i < 4; ++i)
acc += dispatch((int(tid.x) + i) % 4, outBuf[0] + float(i));
outBuf[0] = acc;
}
Command:
slangc dynamic_dispatch.slang -target spirv -emit-spirv-directly -report-perf-benchmark
Expected Behavior
specializeModule scales roughly linearly in N (the number of implementations).
Actual Behavior
specializeModule scales quadratically — about 4× per doubling of N. Measured (Apple Silicon,
RelWithDebInfo, median of 5 runs):
| N |
specializeModule |
| 50 |
11 ms |
| 100 |
32 ms |
| 200 |
113 ms |
| 400 |
416 ms |
The same generator's existential_aggregate workload is affected similarly.
Test Plan
tools/compile-perf dynamic_dispatch (primary timer specializeModule) and
existential_aggregate workloads.
- Regression test
tests/language-feature/dynamic-dispatch/many-impls.slang.
A fix is proposed in #11760.
Issue Description
Compiling code that dispatches an interface with
Nimplementations through a runtime-typedexistential takes time quadratic in
Nduring specialization. ThespecializeModulephase(visible via
-report-perf-benchmark) grows roughly 4× each timeNdoubles. This regressedaround release v2026.7 and had not recovered to the v2026.5 baseline.
Tracking benchmark graph https://shader-slang.org/slang-compile-perf/workloads/dynamic_dispatch.html.
Reproducer Code
An interface with several implementations selected at runtime (so static specialization cannot
collapse the dispatch). The shape below compiles as-is; the quadratic shows up as the number of
implementations grows (the
tools/compile-perfsuite generates exactly this as itsdynamic_dispatchworkload, scaled byN).Command:
Expected Behavior
specializeModulescales roughly linearly inN(the number of implementations).Actual Behavior
specializeModulescales quadratically — about 4× per doubling ofN. Measured (Apple Silicon,RelWithDebInfo, median of 5 runs):
The same generator's
existential_aggregateworkload is affected similarly.Test Plan
tools/compile-perfdynamic_dispatch(primary timerspecializeModule) andexistential_aggregateworkloads.tests/language-feature/dynamic-dispatch/many-impls.slang.A fix is proposed in #11760.