fused-MoE expert kernel emits empty input shapes (Input Dims absent on pybind built-in), blocking shape-anchored kernel-opt dispatch

### Summary

The Triton fused-MoE expert kernel (`invoke_fused_moe_kernel`, the dominant decode-time MoE GPU consumer for Qwen3-30B-A3B and similar models) reaches the torch trace as a **pybind built-in** (`sglang_profiler::fused_moe_triton_kernels_invoke_fused_moe_kernel_427`) whose top-level kernel event carries **no resolvable `Input Dims`**. As a result:

- `{category}_ops.csv` (e.g. `moe_fused_ops.csv`) and `moe_fused_metrics.json::operations[]` carry an empty `Input Dims` / `args`, so the kernel cannot be roofline'd (`efficiency_percent` is null) **and cannot be shape-anchored**.
- The rendered `analysis.md` P-item for this kernel has an empty **Args** column.
- Downstream consumers that require trace-anchored input shapes (e.g. the internal kernel-opt dispatch gate) reject the kernel with an `empty_kernel_shape` error before any optimization harness is built.

This is the **shape-capture** half of the fused-MoE gap. It shares a root cause with the empty-`Input Dims` issues behind #726 / #727 (#727 added the perf model + surfaced the dominant kernel as a non-quantifiable P-item when its roofline is unresolved; this is the remaining half that recovers the operand shapes).

### Root cause / where the dims actually live

TraceLens **does** still capture the operands for this kernel — just not on the dimensionless built-in event. The wrapped invocation is recorded per-shape in `perf_report_csvs/ops_unique_args.csv`, keyed by the embedded `invoke_fused_moe_kernel` symbol, with the two grouped-GEMM operand sets:

- gate/up GEMM: `A(num_tokens, H)` x `w1(E, 2*I, H)` -> `C(T, 2*I)` → `(15360,2048)`, `(128,1536,2048)`, `(122880,1536)` (bf16)
- down GEMM: `A(T, I)` x `w2(E, H, I)` -> `C(num_tokens, topk, H)` → `(122880,768)`, `(128,2048,768)`, `(15360,8,2048)` (bf16)

(Qwen3-30B-A3B MoE: E=128, top-8, H=2048, I=768; conc 64, ISL/OSL 1024.) These match the shapes a hand-written fused-MoE GEAK harness used to reach a validated 1.19x.

### Fix

Recover the fused-MoE expert kernel's operand shapes from `ops_unique_args.csv` and render them into the operation's `args` (the same `format_args` rendering the resolved path uses) when the kernel's own `Input Dims` are empty. Scoped to the `invoke_fused_moe_kernel` op pattern so other kernels are untouched. See PR #727 (extended).

### Cross-refs
- #726, #727 (same empty-`Input Dims` root)
- Internal pipeline companion: candidate finalization back-fills the same operand shapes from `ops_unique_args.csv` so the kernel-opt dispatch gate (`_validate_kernel_shape_and_paths`) passes with `shape_provenance=torch_trace`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fused-MoE expert kernel emits empty input shapes (Input Dims absent on pybind built-in), blocking shape-anchored kernel-opt dispatch #731

Summary

Root cause / where the dims actually live

Fix

Cross-refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fused-MoE expert kernel emits empty input shapes (Input Dims absent on pybind built-in), blocking shape-anchored kernel-opt dispatch #731

Description

Summary

Root cause / where the dims actually live

Fix

Cross-refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions