Skip to content

O(N²) compile-time in dynamic-dispatch specialization (specializeModule) #11776

Description

@jvepsalainen-nv

Issue Description

Compiling code that dispatches an interface with N implementations through a runtime-typed
existential takes time quadratic in N during specialization. The specializeModule phase
(visible via -report-perf-benchmark) grows roughly 4× each time N doubles. This regressed
around release v2026.7 and had not recovered to the v2026.5 baseline.

Tracking benchmark graph https://shader-slang.org/slang-compile-perf/workloads/dynamic_dispatch.html.

Reproducer Code

An interface with several implementations selected at runtime (so static specialization cannot
collapse the dispatch). The shape below compiles as-is; the quadratic shows up as the number of
implementations grows (the tools/compile-perf suite generates exactly this as its
dynamic_dispatch workload, scaled by N).

RWStructuredBuffer<float> outBuf;

[anyValueSize(16)]
interface IShape { float eval(float x); }

struct S0 : IShape { float eval(float x) { return x * 1.0 + sin(x); } }
struct S1 : IShape { float eval(float x) { return x * 2.0 + sin(x * 2.0); } }
struct S2 : IShape { float eval(float x) { return x * 3.0 + sin(x * 3.0); } }
struct S3 : IShape { float eval(float x) { return x * 4.0 + sin(x * 4.0); } }
// ... scale this up to N implementations ...

float dispatch(int id, float x)
{
    IShape s;
    switch (id)
    {
    case 0:  s = S0(); break;
    case 1:  s = S1(); break;
    case 2:  s = S2(); break;
    case 3:  s = S3(); break;
    default: s = S0(); break;
    }
    return s.eval(x);
}

[shader("compute")]
[numthreads(1, 1, 1)]
void computeMain(uint3 tid : SV_DispatchThreadID)
{
    float acc = 0.0;
    for (int i = 0; i < 4; ++i)
        acc += dispatch((int(tid.x) + i) % 4, outBuf[0] + float(i));
    outBuf[0] = acc;
}

Command:

slangc dynamic_dispatch.slang -target spirv -emit-spirv-directly -report-perf-benchmark

Expected Behavior

specializeModule scales roughly linearly in N (the number of implementations).

Actual Behavior

specializeModule scales quadratically — about 4× per doubling of N. Measured (Apple Silicon,
RelWithDebInfo, median of 5 runs):

N specializeModule
50 11 ms
100 32 ms
200 113 ms
400 416 ms

The same generator's existential_aggregate workload is affected similarly.

Test Plan

  • tools/compile-perf dynamic_dispatch (primary timer specializeModule) and
    existential_aggregate workloads.
  • Regression test tests/language-feature/dynamic-dispatch/many-impls.slang.

A fix is proposed in #11760.

Metadata

Metadata

Labels

No labels
No labels
No fields configured for Performance.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions