[CI] Calibrate v1 thresholds for cuda graph at 2026.05.06#403
[CI] Calibrate v1 thresholds for cuda graph at 2026.05.06#403zhaochenyang20 wants to merge 5 commits intomainfrom
Conversation
Apply the local worst-of-5 calibration observations so V1 CI thresholds match the measured H20 reproduction run, and include a lightweight report pointing to the retained raw artifacts. Co-authored-by: Cursor <[email protected]>
|
Skill improvements made:
|
Qwen3 Omni V1 CUDA Graph Calibration ReportThis lightweight report records the second Qwen3 Omni V1 threshold calibration
Accuracy and WER
Speed Worst-of-5
Applied Threshold PolicySmart apply was used. Automatically tightened speed thresholds were applied,
NotesAccuracy did not show a broad regression in this run. MMSU text-only was Performance improved strongly for TTS and several talker/text paths. Video |
Qwen3 Omni V1 CUDA Graph Calibration Report
This lightweight report records the Qwen3 Omni V1 threshold calibration after
verifying CUDA Graph replay for thinker/talker decode paths.
qwen3-omni-v1mmmu,mmmu_talker,mmsu,mmsu_talker,tts,videoamme,videoamme_talker,videomme,videomme_talker.tune-runs/20260506T220900Z_qwen3-omni-v1_cuda-graph_no-docs_r5.tune-runs/and are not included in git.cuda graph: Truedecode batches.Accuracy and WER
Speed Worst-of-5
Applied Threshold Policy
Smart apply was used. Automatically tightened speed thresholds were applied,
and user-selected custom/confirmed values were applied for the remaining
interactive metrics. Metrics explicitly kept at the current threshold are not
listed below.
Notes
CUDA Graph produced large gains for TTS and several talker/text paths. Video
stages remain mixed because preprocessing, long prefill, audio synthesis, and
ASR can dominate over decode replay.