[TRTLLM-10244][doc] Add deployment guide for Nemotron 3 Super#12129
[TRTLLM-10244][doc] Add deployment guide for Nemotron 3 Super#12129nv-guomingz wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
/bot run |
📝 WalkthroughWalkthroughThis pull request adds comprehensive support for Nemotron-3 Super model deployment on TensorRT LLM, including a detailed deployment guide, curated performance configuration, model documentation, and updates to the supported models registry. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md`:
- Around line 116-119: The doc uses two different KV-cache keys which is
inconsistent: `kv_cache_free_gpu_memory_fraction` vs
`kv_cache_config.free_gpu_memory_fraction`; pick one canonical name and make the
text and examples match it. Update the descriptive header and the prose line to
use the same key used in the YAML examples (prefer
`kv_cache_config.free_gpu_memory_fraction` if that is the actual schema), and
ensure any recommendations or examples reference that exact symbol so readers
won't copy the wrong key.
- Around line 17-19: The second bullet's URL incorrectly points to the NVFP4
repo instead of the BF16 checkpoint; update the href for the list item titled
"NVIDIA-Nemotron-3-Super-120B-A12B-BF16" so it links to the BF16 repository (the
same target as the first bullet) rather than
"NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4", ensuring the three bullets each point
to their correct unique repos.
- Around line 58-97: The recommended YAML (referenced as
nemotron-3-super-throughput.yaml and set into EXTRA_LLM_API_FILE) hard-codes
tensor_parallel_size: 4 and moe_expert_parallel_size: 4 which is not portable to
the smaller NVFP4 minimum hardware (e.g., 1x B200); update the docs near the
trtllm-serve examples to state that the curated YAML assumes a 4-way tensor and
MoE parallel setup and instruct users to either (A) edit tensor_parallel_size
and moe_expert_parallel_size in nemotron-3-super-throughput.yaml to values
compatible with their topology (specifically call out lowering them for 1x B200)
or (B) point to or provide alternative example YAMLs for the 1x B200 and 2x
H100/H200 cases before running trtllm-serve, so users do not blindly copy the
same config for both NVFP4 and BF16 commands.
In `@docs/source/deployment-guide/index.rst`:
- Around line 35-39: The quick-start table row for "Nemotron v3 Super (NVFP4)"
currently lists "GPU: Any" which is inaccurate; update that GPU cell to
"Hopper/Blackwell-class GPUs (sufficient memory)" or similar restrictive wording
so users know they need Hopper/Blackwell-class hardware; ensure the change is
applied to the row containing "Nemotron v3 Super (NVFP4)", the linked config
reference "nemotron-3-super-throughput.yaml", and the example command
"trtllm-serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --config
${TRTLLM_DIR}/examples/configs/curated/nemotron-3-super-throughput.yaml".
In `@docs/source/models/supported-models.md`:
- Line 26: Replace the collection reference in the supported models table with a
concrete model repo so users can load it directly: update the row that lists
`NemotronHForCausalLM` (currently showing `nvidia/nvidia-nemotron-v3`) to a
specific model ID such as `nvidia/nemotron-3-super` or `nvidia/nemotron-3-nano`
so it matches the other entries and is immediately loadable.
In `@examples/models/core/nemotron/README_nemotron_super_v3.md`:
- Line 71: Update the curl example to use a client-reachable host instead of the
server bind address; replace 'http://0.0.0.0:8000/v1/chat/completions' with
'http://localhost:8000/v1/chat/completions' (or the actual server IP) so the
example line in README_nemotron_super_v3.md is copy-pasteable by clients.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 25f037c6-74c1-4545-adc5-e615ae00db05
📒 Files selected for processing (5)
docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.mddocs/source/deployment-guide/index.rstdocs/source/models/supported-models.mdexamples/configs/curated/nemotron-3-super-throughput.yamlexamples/models/core/nemotron/README_nemotron_super_v3.md
docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md
Show resolved
Hide resolved
docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md
Outdated
Show resolved
Hide resolved
docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md
Show resolved
Hide resolved
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
c22f5e5 to
0dbd6b3
Compare
|
Can platform support be added here? Spark for instance and which models/networks are supported and not supported. TRT-LLM for instance is supported, but vLLM and others do not. |
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.