You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
V1 pipeline never populated `usage` (prompt/completion/total tokens) anywhere on the chain. The decode stage's result dict didn't have it, the merged-terminal client branch ignored it, so the API returned `usage=null`. The benchmark client read `body["usage"]` as `{}`, set `completion_tokens=0`, and `compute_speed_metrics` dropped `tok_per_s_agg` — making `assert_speed_thresholds` crash with `KeyError: 'tok_per_s_agg'`.
101
+
102
+
Files touched:
103
+
-`sglang_omni_v1/models/qwen3_omni/stages.py` — `_decode` now sets `result["usage"] = {prompt_tokens, completion_tokens, total_tokens}` from `state.prompt["input_ids"]` and `thinker_out["output_ids"]`.
104
+
-`sglang_omni_v1/client/client.py` — `_default_result_builder`'s merged-terminal branch (`{"decode": ..., "code2wav": ...}`) now also propagates `decode_result["usage"]` into `chunk.usage`. The simple-dict branch already worked.
105
+
106
+
Stage 3 verified after this fix: 1 passed in 362s.
99
107
100
-
These surfaced during the run but were **not** fixed (they don't gate stage-1):
108
+
## Known V1 issues outside this PR's reach
101
109
102
-
-**`tok_per_s_agg` missing in V1 benchmark summaries.**`compute_speed_metrics` only adds the key when `total_engine_time > 0 AND total_tokens > 0`. V1's per-request `engine_time_s` and/or `completion_tokens` are not populated, so the key is dropped. CI's `assert_speed_thresholds` reads `summary["tok_per_s_agg"]` unconditionally → `KeyError`. Stage 3 hit this; stages 5/7/9 (and possibly the talker speed paths) are likely to hit it too.
110
+
(none currently — all root causes encountered so far are fixed by Fixes 1–4.)
0 commit comments