[NOT RATE MATCHED]Add NVFP4 WideEP disaggregated DEP8/DEP16/DEP32 recipes for Qwen3.5-397B-A17B#167
[NOT RATE MATCHED]Add NVFP4 WideEP disaggregated DEP8/DEP16/DEP32 recipes for Qwen3.5-397B-A17B#167xiaoweiw-nv wants to merge 5 commits into
Conversation
xiaoweiw-nv
commented
May 20, 2026
Drops misleading pr-main prefix; name now reflects workload scope
(Qwen3.5 NVFP4 WideEP). Updates setup_script reference in the three
dep4-dep{8,16,32} disagg recipes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One DEP4 prefill worker sustains ~89K tok/s input at ISL=OSL=1000; only cc>=2048 in DEP16/DEP32 exceeds that and needs a second prefill worker. Split DEP16/DEP32 into lowcc (1pw, cc=1..1024) and highcc (2pw, cc=2048..N) variants. DEP8 stays single-config with 1 prefill worker (top out ~50K tok/s, never saturates one worker). Saves nodes on low-cc sweeps: DEP8: 4 -> 3 nodes DEP16-lowcc: 8 -> 5 nodes DEP16-highcc 8 -> 6 nodes DEP32-lowcc 16 -> 9 nodes DEP32-highcc 16 -> 10 nodes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| f.write(f'jetstream {{ store_dir: "{nats_store_dir}" }}\n') | ||
| logger.info("Starting NATS server (max_payload: %dMB)...", max_payload_mb) | ||
| cmd = [binary_path, "-c", nats_config_path] | ||
| cmd = ["taskset", "-c", "140-143", binary_path, "-c", nats_config_path] # OMC_CPU_PIN_PATCH_APPLIED |
There was a problem hiding this comment.
Why do need to change this, please?
There was a problem hiding this comment.
this is to pin ETCD to CPU cores, otherwise it would easily run into timeout. ETCD has a heartbeat mechanism, if the CPU is too busy (JIT/warmup) and missed the heartbeat window then ETCD would exit with error. Pin ETCD CPU can prevent CPU starvation.
There was a problem hiding this comment.
cc @ishandhanani does this change look good to you? Can merge if you are OK with the change in src/srtctl/cli/setup_head.py
There was a problem hiding this comment.
cc @ishandhanani does this change look good to you? Can merge if you are OK with the change in
src/srtctl/cli/setup_head.py
Another workaround here is to set a longer ETCD heartbeat interval, if you don't want the code change here
There was a problem hiding this comment.
Can we possibly make this tunable? We should not hardcode
There was a problem hiding this comment.
Can we possibly make this tunable? We should not hardcode
Added a etcd_cpu_affinity field, user can specify the cpu affinity by:
infra:
etcd_cpu_affinity: "140-143"
Recent runs show prefill warmup completes in <60s with prebuilt-v3 container + persisted flashinfer/deepgemm caches, well below the upstream 1800s default. The 7200s bump never triggered. Remove configs/qwen35-nvfp4-wideep-setup.sh and setup_script: references from the five disagg recipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #167 +/- ##
=======================================
Coverage ? 65.10%
=======================================
Files ? 67
Lines ? 8217
Branches ? 0
=======================================
Hits ? 5350
Misses ? 2867
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|