Commit 475cc1d
[NVIDIA] Update GLM-5 NVFP4 B200 SGLang config (#1011)
* Update GLM-5 NVFP4 B200 SGLang config and benchmark script
Add tp4 ep1 conc-128 search-space entry for both 1k1k and 8k1k configs.
Update benchmark script with new server launch flags: enable-dp-lm-head,
disable-radix-cache, fp8_e4m3 kv-cache, NSA trtllm backends, flashinfer
allreduce fusion, and tuned prefill/memory settings.
Bump GLM-5 NVFP4 B200 tp4 concurrency to 256
* Add perf-changelog entry for GLM-5 NVFP4 B200 SGLang config update
Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
* Update GLM-5 NVFP4 B200: tp8 conc=4, tp4 conc=4-256, cuda-graph-max-bs 256
* Remove enable-dp-lm-head option from script
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
Co-authored-by: hshrivastava-droid <hshrivastava@nvidia.com>1 parent 49db200 commit 475cc1d
File tree
3 files changed
+27
-13
lines changed- .github/configs
- benchmarks/single_node
3 files changed
+27
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1837 | 1837 | | |
1838 | 1838 | | |
1839 | 1839 | | |
1840 | | - | |
| 1840 | + | |
| 1841 | + | |
1841 | 1842 | | |
1842 | 1843 | | |
1843 | 1844 | | |
1844 | | - | |
| 1845 | + | |
| 1846 | + | |
1845 | 1847 | | |
1846 | 1848 | | |
1847 | 1849 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | 36 | | |
41 | 37 | | |
42 | 38 | | |
43 | 39 | | |
44 | | - | |
45 | | - | |
46 | | - | |
| 40 | + | |
| 41 | + | |
47 | 42 | | |
48 | | - | |
49 | | - | |
50 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
51 | 52 | | |
52 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1307 | 1307 | | |
1308 | 1308 | | |
1309 | 1309 | | |
| 1310 | + | |
| 1311 | + | |
| 1312 | + | |
| 1313 | + | |
| 1314 | + | |
| 1315 | + | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
0 commit comments