[CI Monitor] Daily Report - 2026-06-28

# Daily Cross-Workflow Summary — 2026-06-28
**Snapshot**: 2026-06-28 02:42 UTC · Only completed runs counted in trends · Auto-updated every 30 min

## TL;DR
🔴 RED · ~18 active clusters · **6 🆕 today (R226–R231)** · ~12 carrying over · **4 review-ready fixes still open (none merged since yesterday)** · scout did **not** run (cron Mon/Thu)
👉 **Today's ask**: **R225 is now 2 days unfixed and still breaks DeepSeek-V3.2 MTP on BOTH nightlies** — its breaker [#29413](https://github.com/sgl-project/sglang/pull/29413) ([`9214b933`](https://github.com/sgl-project/sglang/commit/9214b9338fcb1df5b1a9ab5835aa631539f55556)) is **merged with no revert/guard in flight** ([#29499](https://github.com/sgl-project/sglang/pull/29499) is an adjacent optimization, not a fix). Revert or ROCm-guard it. **R222 (ROCm Conv3D CUDA-only) remains the largest pr-test-amd breaker** (~16 job-blocks across 5 runs, still no fix). **Land the 4 review-ready fixes that did NOT move yesterday**: [#29376](https://github.com/sgl-project/sglang/pull/29376) (R214), [#27141](https://github.com/sgl-project/sglang/pull/27141)+[#29391](https://github.com/sgl-project/sglang/pull/29391) (R195), [#28889](https://github.com/sgl-project/sglang/pull/28889) (R192), [#27757](https://github.com/sgl-project/sglang/pull/27757) (R2). **New today**: a never-passed diffusion `update_weights_from_disk` 500 (R226, both PR workflows) + 3 latest-run-only 8-GPU pr-test-amd regressions (R227/R228/R219) needing a rerun to separate node-flake from regression. **Both release-docker workflows ✅ green.** **pr-test-amd-rocm720 again ≈0 clean signal** (cron self-cancel + HF-429 cascade).

## Workflow status

| Workflow | Latest run | ✅ | ❌ | Trend (completed real failures) | Δ vs yesterday |
|---|---|---|---|---|---|
| nightly-test-amd | Jun-27 [28297034445](https://github.com/sgl-project/sglang/actions/runs/28297034445) | 0 | ~8 real (rest HF-infra) | 10·10·10·~6·~8 | +R229 NEW |
| nightly-test-amd-rocm720 | Jun-27 [28296988041](https://github.com/sgl-project/sglang/actions/runs/28296988041) | 0 | ~10 real (rest HF-infra) | 12·11·~5·~10 | +R229,R230 NEW |
| release-docker-amd-nightly | Jun-27 (latest) | ✅ | 0 | 0·0·0 | 0 |
| release-docker-amd-rocm720-nightly | Jun-27 (latest) | ✅ | 0 | 0·0·0 | 0 |
| amd-aiter-scout | none (last Jun-25 [28199192232](https://github.com/sgl-project/sglang/actions/runs/28199192232)) | — | — | — | no run (not Mon/Thu) |
| pr-test-amd | rolling Jun-27→28, latest [28306914695](https://github.com/sgl-project/sglang/actions/runs/28306914695) | 0 | R222+R192+R214+R226/run | worsening (+R227/R228/R219 latest) | +R226,R227,R228 NEW |
| pr-test-amd-rocm720 | Jun-27 [28297015673](https://github.com/sgl-project/sglang/actions/runs/28297015673) | 0 | **≈0 clean** (cron-cancel + HF-429) | ≈0 real | +R231 confirmed |

> **Notes**: (1) **amd-aiter-scout** did not run (cron Mon/Thu); R221/R223 carry over **dormant**, no fresh data. (2) Both nightlies' Jun-27 runs are still dominated by HF weight-download hangs / `429` (infra), not code. (3) pr-test-amd-rocm720 run [28297015673](https://github.com/sgl-project/sglang/actions/runs/28297015673) is again ≈0 clean signal: two crons share one `cancel-in-progress` group + an HF-429 fast-fail cascade cancels most downstream jobs.

## 🆕 NEW clusters today

### R226 · 🆕 · **Diffusion `update_weights_from_disk` → HTTP 500 "Inplace update to inference tensor outside InferenceMode"** — pr-test-amd + pr-test-amd-rocm720
- **Status**: NEW 2026-06-28; **never-passed** (new test file `test_update_weights_from_disk.py`). Appears in multimodal-gen 1-GPU shard 3 across 3 pr-test-amd runs **and** pr-test-amd-rocm720. A secondary fixture bug (perturbed-VAE clone missing `transformer` dir → setup ERRORs on FLUX.2) rides in the same job.
- **Top hypothesis**: `[LOW]` server-side weight-apply path performs an in-place write on an inference-mode tensor (and a shape mismatch on FLUX.2/Qwen-Image), so every `update_weights_from_disk` request returns 500. **Disconfirming**: never-passed ⇒ could be a brand-new test exercising an unimplemented diffusion path rather than a regression. **In-flight fix**: ❌ none found.
- **Suggested triage**: confirm whether this test was newly registered (never green anywhere) vs. regressed; if new, treat as feature-gap on the diffusion weight-update endpoint and route to the diffusion owners; fix the perturbed-VAE fixture to materialize a `transformer` dir.

| Workflow | Job (shard) | Test File | Test Function | Error | Log |
|---|---|---|---|---|---|
| pr-test-amd | [multimodal-gen-1gpu (3)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864630862) | `test_update_weights_from_disk.py` | `test_update_weights_specific_modules[Qwen-Image]` (+4) | 500 "Inplace update to inference tensor" | [Log](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864630862) |
| pr-test-amd | [multimodal-gen-1gpu (3)](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118172) | `test_update_weights_from_disk.py` | `test_update_weights_from_disk_default[Qwen-Image]` (+ FLUX.2 setup ERRORs) | 500 / fixture `No weights dir for transformer` | [Log](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118172) |
| pr-test-amd-rocm720 | [multimodal-gen-1gpu (3)](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610262) | `test_update_weights_from_disk.py` | `TestUpdateWeightsFromDisk.test_update_weights_specific_modules[Qwen-Image]` (+ offload variants) | assert 500 == 200 | [Log](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610262) |

### R229 · 🆕 · **Kimi-K2.6 8-GPU eval TIMEOUT from slow weight load (3300s+ load exhausts 3600s budget)** — both nightlies
- **Status**: NEW 2026-06-28 (startup-bound TIMEOUT, not accuracy); regression since ~Jun 24-25. **Disconfirming vs infra**: this is a deterministic load-time-exceeds-budget, not a transient 429 — borderline between "model too big for budget" and a load-path slowdown.
- **Top hypothesis**: `[LOW]` weight-load wall-clock for Kimi-K2.6 (8-way) now exceeds the per-file 3600s budget (load ~3303-3359s observed). **In-flight fix**: ❌ none (rocm720 per-job cited [#24076](https://github.com/sgl-project/sglang/pull/24076)/[#29178](https://github.com/sgl-project/sglang/pull/29178)/[#28905](https://github.com/sgl-project/sglang/pull/28905) as candidates, none confirmed).
- **Suggested triage**: bump the per-file timeout for Kimi-K2.6 OR pre-cache weights on the runner; profile `load_model` to see if a recent loader change slowed it.

| Workflow | Job (shard) | Test File | Test Function | Error | Log |
|---|---|---|---|---|---|
| nightly-test-amd | [nightly-8-gpu-kimi-k26](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643958) | `test_kimi_k26_eval_amd.py` | N/A (eval runner) | TIMEOUT 3600s → exit 255 (load 3303s) | [Log](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643958) |
| nightly-test-amd-rocm720 | [nightly-8-gpu-kimi-k26-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515349) | `test_kimi_k26_eval_amd.py` | `test_kimi_k26_gsm8k_accuracy` | TIMEOUT 3600s (load 3359s) | [Log](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515349) |

### R230 · 🆕 · **DeepSeek-V4-Pro server SIGKILL (-9) during 8-way fp8 weight load (MI35x ROCm 7.2)** — nightly-rocm720
- **Status**: NEW 2026-06-28 (first failure for these two files today; flaky). Both offline + online retry exit -9. **In-flight fix**: ❌ none.
- **Top hypothesis**: `[LOW]` OOM / SIGKILL during 8-way fp8 weight load (host or device memory pressure on MI35x). **Disconfirming**: first-seen today ⇒ may be a one-off node memory issue; needs a rerun.
- **Suggested triage**: rerun once; if it recurs, capture dmesg/OOM-killer logs and peak host RAM during load.

| Workflow | Job (shard) | Test File | Test Function | Error | Log |
|---|---|---|---|---|---|
| nightly-test-amd-rocm720 | [nightly-8-gpu-mi35x-deepseek-v4-pro-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515346) | `test_deepseek_v4_pro_fp4.py` | `TestDeepseekV4ProFp4.setUpClass` | Server exit -9 (SIGKILL on load) | [Log](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515346) |
| nightly-test-amd-rocm720 | [nightly-8-gpu-mi35x-deepseek-v4-pro-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515346) | `test_deepseek_v4_pro_fp4_cp.py` | `TestDeepseekV4ProFp4CPInterleave.setUpClass` | Server exit -9 (offline+online -9) | [Log](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515346) |

### R227 / R228 / R231 · 🆕 · latest-run-only pr-test regressions (need rerun to separate flake from regression)
- **R227** `[LOW]` — **DeepSeek-R1-MXFP4 8-GPU MTP prefill GPU memory access fault → watchdog → server killed**, pr-test-amd latest run only. May share a root with R225 (spec/MTP path). Suggested triage: rerun; if persistent, bisect the MTP/spec-decode window.
- **R228** `[LOW]` — **Qwen3-Coder-Next 8-GPU decode hang → scheduler watchdog 300s → SIGQUIT → connection refused**, pr-test-amd latest run only. Rerun to rule out node flake.
- **R231** `[LOW]` — **torch.compile InductorError (`AssertionError` in post-grad `decompose_triton_kernel_wrapper_functional` / layernorm `forward_hip`)** on ROCm diffusion T2V/denoising, pr-test-amd-rocm720 (was "minor-new" yesterday, **now confirmed in a 2nd run**). Suggested triage: a clean non-colliding rerun; if it persists, it's a real ROCm `torch.compile` gap, not infra.

| ID | Workflow | Job (shard) | Test File | Test Function | Error | Log |
|---|---|---|---|---|---|---|
| R227 | pr-test-amd | [stage-c-8gpu-mi35x (0)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631092) | `test_deepseek_r1_mxfp4_8gpu.py` | `TestDeepseekR1MXFP4MTP.test_a_gsm8k` | GPU mem fault → watchdog → killed | [Log](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631092) |
| R228 | pr-test-amd | [stage-c-8gpu-mi35x (1)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631081) | `test_qwen3_coder_next_8gpu.py` | `TestQwen3CoderNext.test_bs_1_speed` | decode hang → SIGQUIT → conn refused | [Log](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631081) |
| R231 | pr-test-amd-rocm720 | [multimodal-gen-2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610250) | `test_server_2_gpu.py` | `test_diffusion_generation[wan2_2_t2v_a14b_2gpu …]` | InductorError AssertionError | [Log](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610250) |

## Carry-over active clusters (still red)

### R225 · **AssertionError "All of them must not be None" in DSA eager draft-extend (`dsa_backend.py:721`)** — both nightlies, DeepSeek-V3.2 MTP
- **Status**: **2 days persistent** (since Jun-27); 4 jobs across both nightlies (MI35x perf + accuracy MTP). AMD/ROCm only (NVIDIA shielded by the CUDA-graph draft-extend path enabled in the same commit).
- **Top hypothesis**: `[HIGH]` breaker [#29413](https://github.com/sgl-project/sglang/pull/29413) ([`9214b933`](https://github.com/sgl-project/sglang/commit/9214b9338fcb1df5b1a9ab5835aa631539f55556), merged Jun-27 06:53) CUDA-gates the new draft-extend graph consumer (`_is_cuda or _is_musa`) while leaving the AMD **eager** `init_forward_metadata` assert (lines 717-722) requiring the now-nulled CPU seq-len mirror. **Disconfirming**: end-to-end nulling of `extend_*_cpu` inferred, not traced. **In-flight fix**: ❌ **none** — [#29413](https://github.com/sgl-project/sglang/pull/29413) is **merged**; [#29499](https://github.com/sgl-project/sglang/pull/29499) (open) is a DSA replay optimization, NOT a revert/guard.
- **Suggested triage**: revert [`9214b933`](https://github.com/sgl-project/sglang/commit/9214b9338fcb1df5b1a9ab5835aa631539f55556) on a branch + rerun `test_deepseek_v32_mtp_perf_mi35x.py`; if confirmed, derive `extend_*_cpu` from GPU tensors in the eager `is_draft_extend_v2` branch **or** force `needs_cpu_seq_lens=True` on ROCm. Ping the #29413 author.

| Workflow | Job (shard) | Test File | Test Function | Error | Log |
|---|---|---|---|---|---|
| nightly-test-amd | [nightly-perf-8gpu-mi35x-dsv32-mtp](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643977) | `test_deepseek_v32_mtp_perf_mi35x.py` | `test_bench_one_batch` | `dsa_backend.py:721` → exit -9 | [Log](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643977) |
| nightly-test-amd | [nightly-acc-8gpu-mi35x-dsv32-mtp](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838644003) | `test_deepseek_v32_mtp_eval_mi35x.py` | `TestDeepseekV32TPMTP.setUpClass` | same | [Log](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838644003) |
| nightly-test-amd-rocm720 | [nightly-perf-8gpu-mi35x-dsv32-mtp-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515319) | `test_deepseek_v32_mtp_perf_mi35x.py` | `test_bench_one_batch` | same | [Log](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515319) |
| nightly-test-amd-rocm720 | [nightly-acc-8gpu-mi35x-dsv32-mtp-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515341) | `test_deepseek_v32_mtp_eval_mi35x.py` | `TestDeepseekV32TPMTP.setUpClass` | same | [Log](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515341) |

### R222 · **ROCm RuntimeError "causal Conv3D cat/pad fusion is only available on CUDA" (Wan/diffusion VAE)** — pr-test-amd (largest) + rocm720
- **Status**: every pr-test-amd run since [#29281](https://github.com/sgl-project/sglang/pull/29281) (merged Jun-26); **~16 job-blocks across 5 runs** ([28297966351](https://github.com/sgl-project/sglang/actions/runs/28297966351), [28289294173](https://github.com/sgl-project/sglang/actions/runs/28289294173), [28282108241](https://github.com/sgl-project/sglang/actions/runs/28282108241), [28273478888](https://github.com/sgl-project/sglang/actions/runs/28273478888), [28306914695](https://github.com/sgl-project/sglang/actions/runs/28306914695)) + rocm720 ([83838610250](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610250) I2V/mova). Hits all Wan2.1/2.2 T2V/I2V + mova variants on 1-GPU and 2-GPU shards.
- **Top hypothesis**: `[MEDIUM]` [#29281](https://github.com/sgl-project/sglang/pull/29281) added a CUDA-only fused causal-Conv3D fast path in WanVAE decode with no ROCm/Triton fallback. **In-flight fix**: ❌ none (no Conv3D-guard PR open; the Conv3D search returned only unrelated diffusion PRs).
- **Suggested triage**: guard the fused path behind `is_cuda` with an eager fallback, or revert [#29281](https://github.com/sgl-project/sglang/pull/29281); rerun `test_server_2_gpu.py::test_diffusion_generation[wan2_2_t2v_a14b_2gpu]`.

Representative rows (all shards share the same top frame): pr-test-amd [2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118166) `test_server_2_gpu.py::test_diffusion_generation[wan2_2_i2v_a14b_2gpu …]`; [1gpu (0)](https://github.com/sgl-project/sglang/actions/runs/28289294173/job/83818444804) `test_server_1_gpu.py::[wan2_1_t2v_1.3b_teacache_enabled …]`; rocm720 [2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610250) `[mova_360p_tp2 / wan2_1_i2v_14b_480P/720P_2gpu]`.

| ID | Cluster | Where (latest) | Status | In-flight fix |
|----|---------|----------------|--------|---------------|
| R192 | FLUX.2 modelopt-FP8 `torch._scaled_mm` HIPBLAS_STATUS_NOT_SUPPORTED | pr-test-amd [2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118166) (`test_server_2_gpu.py::[flux2_modelopt_fp8_tp2_t2i]`); ~4 runs | never-passed | ✅ **[#28889](https://github.com/sgl-project/sglang/pull/28889) open — land** |
| R214 | `TokenizedGenerateReqInput` missing `input_embeds` (TypeError) | pr-test-amd [stage-b-1gpu (6)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631031) + rocm720 [stage-b (6)](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610271) (`test_type_based_dispatcher.py`) | recurring since [#29214](https://github.com/sgl-project/sglang/pull/29214) | ✅ **[#29376](https://github.com/sgl-project/sglang/pull/29376) open — unblock & land** |
| R195 | Mamba `extra_buffer needs CUDA/MUSA/NPU (FLA)` on ROCm | nightly [qwen35 83838643968](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643968), [mi35x-qwen35 83838643970](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643970); rocm720 [83838515323](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515323), [83838515348](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515348) | persistent ≥Jun-19 | ✅ **[#27141](https://github.com/sgl-project/sglang/pull/27141)+[#29391](https://github.com/sgl-project/sglang/pull/29391) open — land** |
| R19 | Qwen3-235B-MXFP4 HIP `hipErrorCapturedEvent` capture abort | nightly [83838643963](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643963); rocm720 [83838515337](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515337) | never-passed ≥May-27 | ❌ none (per-job: #27650/#23581 candidates) |
| R2 | Mistral/Mixtral GSM8K below threshold (chat-eval) | rocm720 [83838515256](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515256) (Mistral-7B 0.361) | never-passed ≥Jun-13 | ✅ **[#27757](https://github.com/sgl-project/sglang/pull/27757) open — land** |
| R211 | DeepSeek-R1 HiCache MI35x — GPU mem fault during gsm8k prefill | nightly [83838643952](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643952) | never-passed ≥Jun-20 | ❌ none |
| R196 | VLM DP-encoder mem fault (write to read-only page) | nightly [4-gpu 83838643955](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643955) (`test_encoder_dp.py::test_vlm_mmmu_benchmark`) | flaky/model-dependent | ⚠️ [#18721](https://github.com/sgl-project/sglang/pull/18721) stale |
| R6 | Qwen3-30B-A3B MoE — GPU mem fault (MI35x) | rocm720 [83838515311](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515311) | recurring (4/5; last pass Jun-23) | ❌ none |
| R210 | Qwen3.5 triton-DCP GSM8K 0.556<0.90 | nightly [mi35x-qwen35 83838643970](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643970) (`test_qwen3p5_triton_dcp.py`) | never-passed | ⚠️ [#29230](https://github.com/sgl-project/sglang/pull/29230) DNM |
| R219 | DeepSeek-V3.2 (basic) 8-GPU HSA out-of-resources decode abort | pr-test-amd [stage-c-8gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631084) (`test_deepseek_v32_basic.py::test_a_gsm8k`) | latest-run flake | ❌ none |

<details><summary>Known stable / dormant clusters (no action today) · click to expand</summary>

| ID | Cluster | Where | Status | Fix |
|----|---------|-------|--------|-----|
| R1 | VLM MMMU accuracy below threshold | nightly (today masked by MMMU dataset/429 timeouts) | never-passed ≥Jun-13 | ❌ none |
| R155 | DeepSeek-V3.2 (basic) MI35x GSM8K below threshold | rocm720 (today masked by xet download timeout) | never-passed on rocm720 | ⚠️ partial [#25559](https://github.com/sgl-project/sglang/pull/25559)/[#29050](https://github.com/sgl-project/sglang/pull/29050) |
| R213 | MiniMax-M2.7 GSM8K borderline | nightly | borderline/flaky | ❌ none |
| R220 | Embeddings-API latency threshold | pr-test-amd stage-b-1gpu-large | flake (not seen today) | ❌ none |
| R221 | aiter-caused GPU Hang (exit 134) ROCm 7.2 LoRA | scout only — no run today | dormant | ❌ none |
| R223 | aiter-caused DSV4-Pro-MTP connection-refused | scout only — no run today | dormant | ❌ none |
| R212/R224 | DSV3.2-MTP perf hang / eval borderline | superseded by R225 on MTP jobs | dormant | ❌ none |

</details>

<details><summary>Infrastructure / orchestration noise (not test failures) · click to expand</summary>

- **HF weight-download hangs / `429 Too Many Requests` / Xet `xet_get` stalls**: dominate both nightlies Jun-27 — nightly-amd (DeepSeek-R1-MXFP4-tp2, GLM-5.1-mxfp4, gpt-oss-120b tokenizer filelock, perf-vlm Qwen3-VL-30B 429, MMMU dataset timeout) and nightly-rocm720 (DSR1-mxfp4-tp4, DSV3-0324, gpt-oss-120b, Grok-2, Qwen3-235B, DSV3.2 xet timeout, VLM 429). Partial fix [#23400](https://github.com/sgl-project/sglang/pull/23400) open.
- **pr-test-amd-rocm720 cron self-cancel + HF-429 cascade**: run [28297015673](https://github.com/sgl-project/sglang/actions/runs/28297015673) — two crons share one `cancel-in-progress` group; HF-429 on stage-a/multimodal warmup → fast-fail cancels most downstream jobs. ≈0 clean pytest signal.
- **pr-test-amd diffusion port-5555 `--strict-ports` cascade**: HF-download timeout on first diffusion test leaks scheduler port 5555 → cascades the rest of the 1-GPU shard (cosmos3/wan/lingbot/qwen-image). Many `multimodal-gen-1gpu` rows.
- **ROCm VRAM-not-clear / zombie KFD pre-flight gate**: nightly [glm5-mxfp4 83838644036](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838644036), rocm720 [hicache 83838515342](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515342). Node reboot required.
- **mori build / git-clone network fail**: pr-test-amd [83799696814](https://github.com/sgl-project/sglang/actions/runs/28282108241/job/83799696814) (corrupt `libabsl_base.so` invalid ELF).
- **Kimi-K2-MXFP4 BCG watchdog -9** (pr-test-amd [83799697008](https://github.com/sgl-project/sglang/actions/runs/28282108241/job/83799697008)): flaky (1/6) MoE weight-load watchdog timeout.

</details>

## Workflow drill-down (per-workflow view)

<details><summary>nightly-test-amd · Jun-27 [28297034445](https://github.com/sgl-project/sglang/actions/runs/28297034445) · ~8 real (rest HF-infra)</summary>

| Job (shard) | Test File | Test Function | Cluster | Error |
|---|---|---|---|---|
| [nightly-perf-8gpu-mi35x-dsv32-mtp](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643977) | `test_deepseek_v32_mtp_perf_mi35x.py` | `test_bench_one_batch` | R225 | DSA assert → exit -9 |
| [nightly-acc-8gpu-mi35x-dsv32-mtp](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838644003) | `test_deepseek_v32_mtp_eval_mi35x.py` | `setUpClass` | R225 | DSA assert |
| [nightly-8-gpu-qwen35](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643968) | `test_qwen35_eval_amd.py` | `setUpClass` | R195 | extra_buffer assert |
| [nightly-8-gpu-mi35x-qwen35](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643970) | `test_qwen35_eval_mi35x.py` | `test_lm_eval` | R195 | extra_buffer assert |
| [nightly-8-gpu-mi35x-qwen35](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643970) | `test_qwen3p5_triton_dcp.py` | `test_a_gsm8k` | R210 | gsm8k 0.556<0.90 |
| [nightly-8-gpu-mi35x-qwen3-235b-mxfp4](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643963) | `test_qwen3_instruct_mxfp4.py` | `setUpClass` | R19 | HIP capture -6 |
| [nightly-8-gpu-mi35x-deepseek-r1-hicache](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643952) | `test_deepseek_r1_hicache_mi35x.py` | `test_gsm8k` | R211 | GPU mem fault |
| [nightly-4-gpu](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643955) | `test_encoder_dp.py` | `test_vlm_mmmu_benchmark` | R196 | write to read-only page -9 |
| [nightly-8-gpu-kimi-k26](https://github.com/sgl-project/sglang/actions/runs/28297034445/job/83838643958) | `test_kimi_k26_eval_amd.py` | N/A | R229🆕 | TIMEOUT 3600s (load 3303s) |
| (dsr1-mxfp4-tp2, glm51-mxfp4, gpt-oss-120b, perf-vlm 429, mmmu, glm5-mxfp4 VRAM gate) | various | — | infra | HF download / 429 / VRAM gate |
</details>

<details><summary>nightly-test-amd-rocm720 · Jun-27 [28296988041](https://github.com/sgl-project/sglang/actions/runs/28296988041) · ~10 real (rest HF-infra)</summary>

| Job (shard) | Test File | Test Function | Cluster | Error |
|---|---|---|---|---|
| [nightly-perf-8gpu-mi35x-dsv32-mtp-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515319) | `test_deepseek_v32_mtp_perf_mi35x.py` | `test_bench_one_batch` | R225 | DSA assert |
| [nightly-acc-8gpu-mi35x-dsv32-mtp-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515341) | `test_deepseek_v32_mtp_eval_mi35x.py` | `setUpClass` | R225 | DSA assert |
| [nightly-8-gpu-qwen35-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515323) | `test_qwen35_eval_amd.py` | `setUpClass` | R195 | extra_buffer assert |
| [nightly-8-gpu-mi35x-qwen35-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515348) | `test_qwen35_eval_mi35x.py` | `test_lm_eval` | R195 | extra_buffer assert |
| [nightly-acc-8gpu-mi35x-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515311) | `test_qwen3_moe_eval_mi35x.py` | `test_qwen3_moe_accuracy` | R6 | GPU mem fault -6 |
| [nightly-8-gpu-mi35x-qwen3-235b-mxfp4-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515337) | `test_qwen3_instruct_mxfp4.py` | `setUpClass` | R19 | hipErrorCapturedEvent |
| [nightly-accuracy-2-gpu-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515256) | `test_gsm8k_eval_amd.py` | `test_gsm8k_all_models` (Mistral) | R2 | gsm8k 0.361 |
| [nightly-8-gpu-kimi-k26-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515349) | `test_kimi_k26_eval_amd.py` | `test_kimi_k26_gsm8k_accuracy` | R229🆕 | TIMEOUT (load 3359s) |
| [nightly-8-gpu-mi35x-deepseek-v4-pro-rocm720](https://github.com/sgl-project/sglang/actions/runs/28296988041/job/83838515346) | `test_deepseek_v4_pro_fp4.py` / `_cp.py` | `setUpClass` | R230🆕 | server exit -9 (load) |
| (dsr1-mxfp4-tp4, dsv3.1, dsv3.2 xet, grok2, qwen3-235b, gpt-oss, vlm-429, hicache VRAM gate) | various | — | infra | HF download / 429 / VRAM gate |
</details>

<details><summary>pr-test-amd · rolling Jun-27→28 · latest [28306914695](https://github.com/sgl-project/sglang/actions/runs/28306914695)</summary>

| Job (shard) | Test File | Test Function | Cluster | Error |
|---|---|---|---|---|
| [multimodal-gen-2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118166) ×~16 blocks/5 runs | `test_server_{1,2}_gpu.py` | `test_diffusion_generation[wan2_*]` | R222 | causal Conv3D CUDA-only |
| [multimodal-gen-2gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28297966351/job/83841118166) | `test_server_2_gpu.py` | `[flux2_modelopt_fp8_tp2_t2i]` | R192 | HIPBLAS_STATUS_NOT_SUPPORTED |
| [multimodal-gen-1gpu (3)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864630862) | `test_update_weights_from_disk.py` | `test_update_weights_specific_modules` | R226🆕 | 500 inplace-on-inference tensor |
| [stage-b-1gpu-small (6)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631031) | `test_type_based_dispatcher.py` | `test_type_dispatcher_e2e_performance` | R214 | TypeError (input_embeds) |
| [stage-c-8gpu-mi35x (0)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631092) | `test_deepseek_r1_mxfp4_8gpu.py` | `test_a_gsm8k` | R227🆕 | GPU mem fault (MTP) |
| [stage-c-8gpu-mi35x (1)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631081) | `test_qwen3_coder_next_8gpu.py` | `test_bs_1_speed` | R228🆕 | decode hang → SIGQUIT |
| [stage-c-8gpu (1)](https://github.com/sgl-project/sglang/actions/runs/28306914695/job/83864631084) | `test_deepseek_v32_basic.py` | `test_a_gsm8k` | R219 | HSA out-of-resources |
| (diffusion port-5555 cascades, mori build, kimi-mxfp4 watchdog) | various | — | infra | downloads / cascades |
</details>

<details><summary>pr-test-amd-rocm720 · Jun-27 [28297015673](https://github.com/sgl-project/sglang/actions/runs/28297015673) · ≈0 clean signal (cron-cancel + HF-429)</summary>

Real signal buried under cron `cancel-in-progress` self-cancellation + HF-429 fast-fail cascade: R226 (`test_update_weights_from_disk.py` 500, [1gpu (3) 83838610262](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610262)), R214 (`test_type_based_dispatcher.py`, [stage-b (6) 83838610271](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610271)), R231 (InductorError, [2gpu (1) 83838610250](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610250) + [1gpu (0) 83838610266](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610266)), R222 (causal Conv3D I2V/mova, same 2gpu job). Plus a perf-threshold miss ([stage-b-1gpu-large (0) 83838610248](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610248)) and a `test_start_profile_2` watchdog/CUDA-graph-replay stall ([stage-b (10) 83838610294](https://github.com/sgl-project/sglang/actions/runs/28297015673/job/83838610294)) — both inconclusive without a clean rerun. **Needs a non-colliding rerun for usable signal.**
</details>

## How this report is generated
- Only `status == "completed"` runs counted in trends. Both nightlies' Jun-27 runs treated as completed. **Both release-docker workflows ✅ green; amd-aiter-scout did not run (cron Mon/Thu).**
- **🆕 NEW today**: R226 (diffusion `update_weights` 500), R227 (DSR1-MXFP4 8-GPU MTP mem fault), R228 (Qwen3-Coder-Next 8-GPU hang), R229 (Kimi-K2.6 weight-load timeout, both nightlies), R230 (DSV4-Pro fp8 -9, rocm720), R231 (ROCm `torch.compile` InductorError, confirmed from yesterday's minor-new).
- **Carrying over**: R225 now 2 days unfixed (breaker [#29413](https://github.com/sgl-project/sglang/pull/29413) merged, no revert in flight); R222/R195/R214/R192/R2/R19/R211/R196/R6/R210/R219.
- **In-flight fixes unchanged since yesterday** (none merged): [#29376](https://github.com/sgl-project/sglang/pull/29376) (R214), [#27141](https://github.com/sgl-project/sglang/pull/27141)+[#29391](https://github.com/sgl-project/sglang/pull/29391) (R195), [#28889](https://github.com/sgl-project/sglang/pull/28889) (R192), [#27757](https://github.com/sgl-project/sglang/pull/27757) (R2).
- Confidence: `FACT`/`HIGH`/`MEDIUM`/`LOW`/`SPECULATION`. Bot does NOT assign Priority — engineers decide from cluster size + persistence + fix availability.

---
*Generated by amd-bot · last updated 2026-06-28 02:42 UTC*

---
*Generated by amd-bot using Claude Code CLI (last updated: 2026-06-28 02:42 UTC)*


---

## CI Monitor — 2026-06-28

**Repo**: [sgl-project/sglang](https://github.com/sgl-project/sglang)

**Monitored Workflows**:
- `nightly-test-amd.yml`
- `nightly-test-amd-rocm720.yml`
- `release-docker-amd-nightly.yml`
- `release-docker-amd-rocm720-nightly.yml`
- `amd-aiter-scout.yml`
- `pr-test-amd.yml`
- `pr-test-amd-rocm720.yml`

*Per-workflow failure reports are appended as comments below; the Daily Cross-Workflow Summary is rendered above this section.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI Monitor] Daily Report - 2026-06-28 #118

Daily Cross-Workflow Summary — 2026-06-28

TL;DR

Workflow status

🆕 NEW clusters today

R226 · 🆕 · Diffusion `update_weights_from_disk` → HTTP 500 "Inplace update to inference tensor outside InferenceMode" — pr-test-amd + pr-test-amd-rocm720

R229 · 🆕 · Kimi-K2.6 8-GPU eval TIMEOUT from slow weight load (3300s+ load exhausts 3600s budget) — both nightlies

R230 · 🆕 · DeepSeek-V4-Pro server SIGKILL (-9) during 8-way fp8 weight load (MI35x ROCm 7.2) — nightly-rocm720

R227 / R228 / R231 · 🆕 · latest-run-only pr-test regressions (need rerun to separate flake from regression)

Carry-over active clusters (still red)

R225 · AssertionError "All of them must not be None" in DSA eager draft-extend (`dsa_backend.py:721`) — both nightlies, DeepSeek-V3.2 MTP

R222 · ROCm RuntimeError "causal Conv3D cat/pad fusion is only available on CUDA" (Wan/diffusion VAE) — pr-test-amd (largest) + rocm720

Workflow drill-down (per-workflow view)

How this report is generated

CI Monitor — 2026-06-28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Workflow	Latest run	✅	❌	Trend (completed real failures)	Δ vs yesterday
nightly-test-amd	Jun-27 28297034445	0	~8 real (rest HF-infra)	10·10·10·~6·~8	+R229 NEW
nightly-test-amd-rocm720	Jun-27 28296988041	0	~10 real (rest HF-infra)	12·11·~5·~10	+R229,R230 NEW
release-docker-amd-nightly	Jun-27 (latest)	✅	0	0·0·0	0
release-docker-amd-rocm720-nightly	Jun-27 (latest)	✅	0	0·0·0	0
amd-aiter-scout	none (last Jun-25 28199192232)	—	—	—	no run (not Mon/Thu)
pr-test-amd	rolling Jun-27→28, latest 28306914695	0	R222+R192+R214+R226/run	worsening (+R227/R228/R219 latest)	+R226,R227,R228 NEW
pr-test-amd-rocm720	Jun-27 28297015673	0	≈0 clean (cron-cancel + HF-429)	≈0 real	+R231 confirmed

Workflow	Job (shard)	Test File	Test Function	Error	Log
pr-test-amd	multimodal-gen-1gpu (3)	`test_update_weights_from_disk.py`	`test_update_weights_specific_modules[Qwen-Image]` (+4)	500 "Inplace update to inference tensor"	Log
pr-test-amd	multimodal-gen-1gpu (3)	`test_update_weights_from_disk.py`	`test_update_weights_from_disk_default[Qwen-Image]` (+ FLUX.2 setup ERRORs)	500 / fixture `No weights dir for transformer`	Log
pr-test-amd-rocm720	multimodal-gen-1gpu (3)	`test_update_weights_from_disk.py`	`TestUpdateWeightsFromDisk.test_update_weights_specific_modules[Qwen-Image]` (+ offload variants)	assert 500 == 200	Log

Workflow	Job (shard)	Test File	Test Function	Error	Log
nightly-test-amd	nightly-8-gpu-kimi-k26	`test_kimi_k26_eval_amd.py`	N/A (eval runner)	TIMEOUT 3600s → exit 255 (load 3303s)	Log
nightly-test-amd-rocm720	nightly-8-gpu-kimi-k26-rocm720	`test_kimi_k26_eval_amd.py`	`test_kimi_k26_gsm8k_accuracy`	TIMEOUT 3600s (load 3359s)	Log

Workflow	Job (shard)	Test File	Test Function	Error	Log
nightly-test-amd-rocm720	nightly-8-gpu-mi35x-deepseek-v4-pro-rocm720	`test_deepseek_v4_pro_fp4.py`	`TestDeepseekV4ProFp4.setUpClass`	Server exit -9 (SIGKILL on load)	Log
nightly-test-amd-rocm720	nightly-8-gpu-mi35x-deepseek-v4-pro-rocm720	`test_deepseek_v4_pro_fp4_cp.py`	`TestDeepseekV4ProFp4CPInterleave.setUpClass`	Server exit -9 (offline+online -9)	Log

ID	Workflow	Job (shard)	Test File	Test Function	Error	Log
R227	pr-test-amd	stage-c-8gpu-mi35x (0)	`test_deepseek_r1_mxfp4_8gpu.py`	`TestDeepseekR1MXFP4MTP.test_a_gsm8k`	GPU mem fault → watchdog → killed	Log
R228	pr-test-amd	stage-c-8gpu-mi35x (1)	`test_qwen3_coder_next_8gpu.py`	`TestQwen3CoderNext.test_bs_1_speed`	decode hang → SIGQUIT → conn refused	Log
R231	pr-test-amd-rocm720	multimodal-gen-2gpu (1)	`test_server_2_gpu.py`	`test_diffusion_generation[wan2_2_t2v_a14b_2gpu …]`	InductorError AssertionError	Log

Workflow	Job (shard)	Test File	Test Function	Error	Log
nightly-test-amd	nightly-perf-8gpu-mi35x-dsv32-mtp	`test_deepseek_v32_mtp_perf_mi35x.py`	`test_bench_one_batch`	`dsa_backend.py:721` → exit -9	Log
nightly-test-amd	nightly-acc-8gpu-mi35x-dsv32-mtp	`test_deepseek_v32_mtp_eval_mi35x.py`	`TestDeepseekV32TPMTP.setUpClass`	same	Log
nightly-test-amd-rocm720	nightly-perf-8gpu-mi35x-dsv32-mtp-rocm720	`test_deepseek_v32_mtp_perf_mi35x.py`	`test_bench_one_batch`	same	Log
nightly-test-amd-rocm720	nightly-acc-8gpu-mi35x-dsv32-mtp-rocm720	`test_deepseek_v32_mtp_eval_mi35x.py`	`TestDeepseekV32TPMTP.setUpClass`	same	Log

ID	Cluster	Where (latest)	Status	In-flight fix
R192	FLUX.2 modelopt-FP8 `torch._scaled_mm` HIPBLAS_STATUS_NOT_SUPPORTED	pr-test-amd 2gpu (1) (`test_server_2_gpu.py::[flux2_modelopt_fp8_tp2_t2i]`); ~4 runs	never-passed	✅ #28889 open — land
R214	`TokenizedGenerateReqInput` missing `input_embeds` (TypeError)	pr-test-amd stage-b-1gpu (6) + rocm720 stage-b (6) (`test_type_based_dispatcher.py`)	recurring since #29214	✅ #29376 open — unblock & land
R195	Mamba `extra_buffer needs CUDA/MUSA/NPU (FLA)` on ROCm	nightly qwen35 83838643968, mi35x-qwen35 83838643970; rocm720 83838515323, 83838515348	persistent ≥Jun-19	✅ #27141+#29391 open — land
R19	Qwen3-235B-MXFP4 HIP `hipErrorCapturedEvent` capture abort	nightly 83838643963; rocm720 83838515337	never-passed ≥May-27	❌ none (per-job: #27650/#23581 candidates)
R2	Mistral/Mixtral GSM8K below threshold (chat-eval)	rocm720 83838515256 (Mistral-7B 0.361)	never-passed ≥Jun-13	✅ #27757 open — land
R211	DeepSeek-R1 HiCache MI35x — GPU mem fault during gsm8k prefill	nightly 83838643952	never-passed ≥Jun-20	❌ none
R196	VLM DP-encoder mem fault (write to read-only page)	nightly 4-gpu 83838643955 (`test_encoder_dp.py::test_vlm_mmmu_benchmark`)	flaky/model-dependent	⚠️ #18721 stale
R6	Qwen3-30B-A3B MoE — GPU mem fault (MI35x)	rocm720 83838515311	recurring (4/5; last pass Jun-23)	❌ none
R210	Qwen3.5 triton-DCP GSM8K 0.556<0.90	nightly mi35x-qwen35 83838643970 (`test_qwen3p5_triton_dcp.py`)	never-passed	⚠️ #29230 DNM
R219	DeepSeek-V3.2 (basic) 8-GPU HSA out-of-resources decode abort	pr-test-amd stage-c-8gpu (1) (`test_deepseek_v32_basic.py::test_a_gsm8k`)	latest-run flake	❌ none

ID	Cluster	Where	Status	Fix
R1	VLM MMMU accuracy below threshold	nightly (today masked by MMMU dataset/429 timeouts)	never-passed ≥Jun-13	❌ none
R155	DeepSeek-V3.2 (basic) MI35x GSM8K below threshold	rocm720 (today masked by xet download timeout)	never-passed on rocm720	⚠️ partial #25559/#29050
R213	MiniMax-M2.7 GSM8K borderline	nightly	borderline/flaky	❌ none
R220	Embeddings-API latency threshold	pr-test-amd stage-b-1gpu-large	flake (not seen today)	❌ none
R221	aiter-caused GPU Hang (exit 134) ROCm 7.2 LoRA	scout only — no run today	dormant	❌ none
R223	aiter-caused DSV4-Pro-MTP connection-refused	scout only — no run today	dormant	❌ none
R212/R224	DSV3.2-MTP perf hang / eval borderline	superseded by R225 on MTP jobs	dormant	❌ none

Job (shard)	Test File	Test Function	Cluster	Error
multimodal-gen-2gpu (1) ×~16 blocks/5 runs	`test_server_{1,2}_gpu.py`	`test_diffusion_generation[wan2_*]`	R222	causal Conv3D CUDA-only
multimodal-gen-2gpu (1)	`test_server_2_gpu.py`	`[flux2_modelopt_fp8_tp2_t2i]`	R192	HIPBLAS_STATUS_NOT_SUPPORTED
multimodal-gen-1gpu (3)	`test_update_weights_from_disk.py`	`test_update_weights_specific_modules`	R226🆕	500 inplace-on-inference tensor
stage-b-1gpu-small (6)	`test_type_based_dispatcher.py`	`test_type_dispatcher_e2e_performance`	R214	TypeError (input_embeds)
stage-c-8gpu-mi35x (0)	`test_deepseek_r1_mxfp4_8gpu.py`	`test_a_gsm8k`	R227🆕	GPU mem fault (MTP)
stage-c-8gpu-mi35x (1)	`test_qwen3_coder_next_8gpu.py`	`test_bs_1_speed`	R228🆕	decode hang → SIGQUIT
stage-c-8gpu (1)	`test_deepseek_v32_basic.py`	`test_a_gsm8k`	R219	HSA out-of-resources
(diffusion port-5555 cascades, mori build, kimi-mxfp4 watchdog)	various	—	infra	downloads / cascades

[CI Monitor] Daily Report - 2026-06-28 #118

Description

Daily Cross-Workflow Summary — 2026-06-28

TL;DR

Workflow status

🆕 NEW clusters today

R226 · 🆕 · Diffusion update_weights_from_disk → HTTP 500 "Inplace update to inference tensor outside InferenceMode" — pr-test-amd + pr-test-amd-rocm720

R229 · 🆕 · Kimi-K2.6 8-GPU eval TIMEOUT from slow weight load (3300s+ load exhausts 3600s budget) — both nightlies

R230 · 🆕 · DeepSeek-V4-Pro server SIGKILL (-9) during 8-way fp8 weight load (MI35x ROCm 7.2) — nightly-rocm720

R227 / R228 / R231 · 🆕 · latest-run-only pr-test regressions (need rerun to separate flake from regression)

Carry-over active clusters (still red)

R225 · AssertionError "All of them must not be None" in DSA eager draft-extend (dsa_backend.py:721) — both nightlies, DeepSeek-V3.2 MTP

R222 · ROCm RuntimeError "causal Conv3D cat/pad fusion is only available on CUDA" (Wan/diffusion VAE) — pr-test-amd (largest) + rocm720

Workflow drill-down (per-workflow view)

How this report is generated

CI Monitor — 2026-06-28

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

R226 · 🆕 · Diffusion `update_weights_from_disk` → HTTP 500 "Inplace update to inference tensor outside InferenceMode" — pr-test-amd + pr-test-amd-rocm720

R225 · AssertionError "All of them must not be None" in DSA eager draft-extend (`dsa_backend.py:721`) — both nightlies, DeepSeek-V3.2 MTP