feat: H20 support #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

LyleLuo wants to merge 1 commit into ai-dynamo:main from LyleLuo:main

README.md

-Original file line number
+Diff line change
@@ Expand Up @@
     |--------|-------------------|--------|
     | h100_sxm | TRTLLM(0.20.0, 1.0.0rc3) | ✅ |
     | h200_sxm | TRTLLM(0.20.0, 1.0.0rc3) | ✅ |
+    | h20_3e | TRTLLM(1.0.0) | ✅ |
+    | A100 | TRTLLM(1.0.0) | ✅ |
     | b200_sxm | TRTLLM(1.0.0rc6) | ✅ |
     | gb200_sxm | TRTLLM(1.0.0rc6) | ✅ |
@@ Expand Down @@

src/aiconfigurator/systems/data/h20_3e/nccl/2.27.3/nccl_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/context_attention_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/context_mla_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/custom_allreduce_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/gemm_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/generation_attention_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/generation_mla_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/mla_bmm_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/data/h20_3e/trtllm/1.0.0/moe_perf.txt

Git LFS file not shown

src/aiconfigurator/systems/h20_3e.yaml

-Original file line number
+Diff line change
@@ -0,0 +1,30 @@
+    # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+    # SPDX-License-Identifier: Apache-2.0
+    data_dir: data/h20_3e # relative to systems_dir
+    gpu:
+      mem_bw: 4917000000000 # 4917GB/s
+      mem_bw_empirical_scaling_factor: 0.8 # some nonofficial correction based on observations, you should try to modify based on your own observations
+      mem_empirical_constant_latency: 0.000003 # 3us some nonofficial correction based on observations, you should try to modify based on your own observations
+      mem_capacity: 151397597184 # 141GiB
+      float16_tc_flops: 148000000000000 # 148TFLOPS
+      int8_tc_flops: 296000000000000 # 296TFLOPS
+      fp8_tc_flops: 296000000000000 # 296TFLOPS
+      power: 500  # Watt
+      sm_version: 90
+    node:
+      num_gpus_per_node: 8
+      inter_node_bw: 25000000000  # Byte/s per GPU, single direction, 1:1 CX7 per node
+      intra_node_bw: 450000000000  # Byte/s per gpu, single direction
+      pcie_bw: 64000000000  # Byte/s, single direction, pcie 5.0
+      p2p_latency: 0.00001  # 10us some nonofficial correction based on observations, you should try to modify based on your own observations
+    misc:
+      nccl_mem: # some nonofficial correction based on observations, you should try to modify based on your own observations
+: 0
+: 358612992 # 342MB
+: 411041792 # 392MB
+: 411041792 # 392MB
+      other_mem: 3758096384 # increase from 551MB to 3.5GB for safer deployment, this will cover part of the inaccurate mem calc.
+      nccl_version: '2.27.3'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: H20 support #59

Uh oh!

Diff view

Diff view

There are no files selected for viewing

feat: H20 support #59

Are you sure you want to change the base?

Uh oh!

feat: H20 support #59

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing