Skip to content

Commit 3639097

Browse files
committed
update run instructions in perf doc with 1.2 changes
Signed-off-by: Zachary Patel <22306219+zbpatel@users.noreply.github.com>
1 parent 407c2ed commit 3639097

File tree

1 file changed

+24
-10
lines changed

1 file changed

+24
-10
lines changed

docs/source/developer-guide/perf-overview.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -76,15 +76,15 @@ All performance values are measured in **output tokens per second per GPU**.
7676
## Table of Contents
7777

7878
- [Deepseek R1 0528](#deepseek-r1-0528)
79-
- [Deepseek R1 0528 - RTX Configurations](#deepseek-r1-0528-rtx-configurations)
79+
- [Deepseek R1 0528 - RTX 6000 Pro Blackwell Server Edition](#deepseek-r1-0528-rtx-configurations)
8080
- [GPT-OSS 120B](#gpt-oss-120b)
8181
- [GPT-OSS 20B](#gpt-oss-20b)
8282
- [LLaMA v3.3 70B](#llama-v33-70b)
83-
- [LLaMA v3.3 70B - RTX Configurations](#llama-v33-70b-rtx-configurations)
83+
- [LLaMA v3.3 70B - RTX 6000 Pro Blackwell Server Edition](#llama-v33-70b-rtx-configurations)
8484
- [Qwen3 235B A22B](#qwen3-235b-a22b)
85-
- [Qwen3 235B A22B - RTX Configurations](#qwen3-235b-a22b-rtx-configurations)
85+
- [Qwen3 235B A22B - RTX 6000 Pro Blackwell Server Edition](#qwen3-235b-a22b-rtx-configurations)
8686
- [Qwen3 30B A3B](#qwen3-30b-a3b)
87-
- [Qwen3 30B A3B - RTX Configurations](#qwen3-30b-a3b-rtx-configurations)
87+
- [Qwen3 30B A3B - RTX 6000 Pro Blackwell Server Edition](#qwen3-30b-a3b-rtx-configurations)
8888

8989
---
9090

@@ -105,7 +105,7 @@ All performance values are measured in **output tokens per second per GPU**.
105105

106106
<a id="deepseek-r1-0528-rtx-configurations"></a>
107107

108-
# Deepseek R1 0528 - RTX Configurations (TP/PP)
108+
# Deepseek R1 0528 - RTX 6000 Pro Blackwell Server Edition (TP/PP)
109109

110110
*Shows Tensor Parallel (TP) and Pipeline Parallel (PP) configurations*
111111

@@ -165,7 +165,7 @@ All performance values are measured in **output tokens per second per GPU**.
165165

166166
<a id="llama-v33-70b-rtx-configurations"></a>
167167

168-
# LLaMA v3.3 70B - RTX Configurations (TP/PP)
168+
# LLaMA v3.3 70B - RTX 6000 Pro Blackwell Server Edition (TP/PP)
169169

170170
*Shows Tensor Parallel (TP) and Pipeline Parallel (PP) configurations*
171171

@@ -197,7 +197,7 @@ All performance values are measured in **output tokens per second per GPU**.
197197

198198
<a id="qwen3-235b-a22b-rtx-configurations"></a>
199199

200-
# Qwen3 235B A22B - RTX Configurations (TP/PP)
200+
# Qwen3 235B A22B - RTX 6000 Pro Blackwell Server Edition (TP/PP)
201201

202202
*Shows Tensor Parallel (TP) and Pipeline Parallel (PP) configurations*
203203

@@ -229,7 +229,7 @@ All performance values are measured in **output tokens per second per GPU**.
229229

230230
<a id="qwen3-30b-a3b-rtx-configurations"></a>
231231

232-
# Qwen3 30B A3B - RTX Configurations (TP/PP)
232+
# Qwen3 30B A3B - RTX 6000 Pro Blackwell Server Edition (TP/PP)
233233

234234
*Shows Tensor Parallel (TP) and Pipeline Parallel (PP) configurations*
235235

@@ -313,7 +313,7 @@ a model name (HuggingFace reference or path to a local model), a [generated data
313313

314314
For dense / non-MoE models:
315315
```shell
316-
trtllm-bench --tp $tp_size --pp $pp_size --model $model_name throughput --dataset $dataset_file --backend pytorch --config $llm_options
316+
trtllm-bench --tp $tp_size --pp $pp_size --model $model_name throughput --dataset $dataset_file --backend pytorch --config $llm_options --concurrency -1
317317
```
318318
Llama 3.3
319319

@@ -342,7 +342,7 @@ kv_cache_config:
342342
dtype: fp8
343343
# Hopper: use auto
344344
moe_config:
345-
backend: CUTLASS
345+
backend: TRTLLM
346346
# Hopper: use TRITON
347347
```
348348

@@ -364,6 +364,20 @@ kv_cache_config:
364364
dtype: fp8
365365
```
366366
367+
Kimi K2:
368+
369+
`llm_options.yml`
370+
```yaml
371+
enable_attention_dp: true
372+
cuda_graph_config:
373+
enable_padding: true
374+
batch_sizes: [1, 2, 4, 8, 16, 32, 64, 128, 256, 384]
375+
moe_config:
376+
backend: CUTLASS
377+
kv_cache_config:
378+
dtype: auto
379+
```
380+
367381
Qwen3 MoE, Llama4 Maverick:
368382

369383
`llm_options.yml`

0 commit comments

Comments
 (0)