Skip to content

Commit 569350e

Browse files
committed
start making updates to perf-overview.md instructions for release 1.2
Signed-off-by: Zachary Patel <22306219+zbpatel@users.noreply.github.com>
1 parent 21796cd commit 569350e

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

docs/source/developer-guide/perf-overview.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ For DeepSeek R1 performance, please check out our [performance guide](../blogs/B
1717

1818
For more information on benchmarking with `trtllm-bench` see this NVIDIA [blog post](https://developer.nvidia.com/blog/llm-inference-benchmarking-performance-tuning-with-tensorrt-llm/).
1919

20+
For NUMA systems, we recommend consulting the ["CPU Affinity configuration in TensorRT LLM"](../deployment-guide/configuring-cpu-affinity.md) guide to achieve best performance. These options were enabled for relevant tests.
21+
2022
## Throughput Measurements
2123

2224
The below table shows performance data where a local inference client is fed requests at an high rate / no delay between messages,
@@ -34,14 +36,17 @@ The following GPU variants were used for testing:
3436
- H100 SXM 80GB (DGX H100)
3537
- H200 SXM 141GB (DGX H200)
3638
- B200 180GB (DGX B200)
39+
- B300 288GB (DGX B300)
3740
- GB200 192GB (GB200 NVL72)
41+
- GB300 (GB300 NVL72)
3842
- RTX 6000 Pro Blackwell Server Edition
3943

4044
Other hardware variants may have different TDP, memory bandwidth, core count, or other features leading to performance differences on these workloads.
4145

4246
### FP4 Models
4347

4448
```text
49+
nvidia/Kimi-K2-Instruct-NVFP4
4550
nvidia/DeepSeek-R1-0528-NVFP4-v2
4651
nvidia/Qwen3-235B-A22B-FP4
4752
nvidia/Qwen3-30B-A3B-FP4
@@ -52,6 +57,7 @@ nvidia/Llama-4-Maverick-17B-128E-Instruct-NVFP4
5257
### FP8 Models
5358

5459
```text
60+
moonshotai/Kimi-K2-Instruct
5561
deepseek-ai/DeepSeek-R1-0528
5662
nvidia/Qwen3-235B-A22B-FP8
5763
nvidia/Llama-3.3-70B-Instruct-FP8

0 commit comments

Comments
 (0)