Skip to content

[TRTLLM-10244][doc] Add deployment guide for Nemotron 3 Super#12129

Open
nv-guomingz wants to merge 1 commit intoNVIDIA:mainfrom
nv-guomingz:user/guomingz/nemotron_super_doc
Open

[TRTLLM-10244][doc] Add deployment guide for Nemotron 3 Super#12129
nv-guomingz wants to merge 1 commit intoNVIDIA:mainfrom
nv-guomingz:user/guomingz/nemotron_super_doc

Conversation

@nv-guomingz
Copy link
Collaborator

@nv-guomingz nv-guomingz commented Mar 12, 2026

Summary by CodeRabbit

  • Documentation
    • Added comprehensive deployment guide for Nemotron v3 Super on TensorRT LLM with setup instructions, configuration options, and troubleshooting
    • Updated supported models list to include Nemotron v3 Super variants
    • Added curated performance configuration for optimized Nemotron v3 Super deployment
    • Added usage guide for online and offline inference workflows

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@nv-guomingz nv-guomingz requested a review from tijyojwad March 12, 2026 00:02
@nv-guomingz nv-guomingz requested review from a team as code owners March 12, 2026 00:02
@nv-guomingz nv-guomingz changed the title [TRTLLM-10244][doc] Add deployment guide for Nemotron 3 [TRTLLM-10244][doc] Add deployment guide for Nemotron 3 Super Mar 12, 2026
@nv-guomingz
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

This pull request adds comprehensive support for Nemotron-3 Super model deployment on TensorRT LLM, including a detailed deployment guide, curated performance configuration, model documentation, and updates to the supported models registry.

Changes

Cohort / File(s) Summary
Deployment Documentation
docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md, docs/source/deployment-guide/index.rst
New deployment guide for Nemotron-3 Super on TensorRT LLM covering prerequisites, GPU requirements, deployment steps, YAML config options, API testing, troubleshooting, and benchmarking. Index updated to reference the new guide and add Nemotron-3 Super to the popular models table.
Model Documentation & Supported Models
docs/source/models/supported-models.md, examples/models/core/nemotron/README_nemotron_super_v3.md
Updated supported models documentation to include Nemotron-3 Super variant alongside Nemotron-3 Nano. New README introduces the Nemotron Super V3 hybrid Mamba-Transformer MoE model with online serving and offline inference examples.
Curated Configuration
examples/configs/curated/nemotron-3-super-throughput.yaml
New YAML configuration file defining optimized runtime and performance parameters for Nemotron-3 Super, including batch sizing, tensor parallelism, MoE expert parallelism, and KV cache settings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete, containing only the template structure with no substantive content added in the Description or Test Coverage sections. Fill in the Description section explaining what changes are being made and why, and provide Test Coverage details on how the documentation updates were validated.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately reflects the main change: adding deployment documentation for Nemotron 3 Super model with TRTLLM ticket reference.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md`:
- Around line 116-119: The doc uses two different KV-cache keys which is
inconsistent: `kv_cache_free_gpu_memory_fraction` vs
`kv_cache_config.free_gpu_memory_fraction`; pick one canonical name and make the
text and examples match it. Update the descriptive header and the prose line to
use the same key used in the YAML examples (prefer
`kv_cache_config.free_gpu_memory_fraction` if that is the actual schema), and
ensure any recommendations or examples reference that exact symbol so readers
won't copy the wrong key.
- Around line 17-19: The second bullet's URL incorrectly points to the NVFP4
repo instead of the BF16 checkpoint; update the href for the list item titled
"NVIDIA-Nemotron-3-Super-120B-A12B-BF16" so it links to the BF16 repository (the
same target as the first bullet) rather than
"NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4", ensuring the three bullets each point
to their correct unique repos.
- Around line 58-97: The recommended YAML (referenced as
nemotron-3-super-throughput.yaml and set into EXTRA_LLM_API_FILE) hard-codes
tensor_parallel_size: 4 and moe_expert_parallel_size: 4 which is not portable to
the smaller NVFP4 minimum hardware (e.g., 1x B200); update the docs near the
trtllm-serve examples to state that the curated YAML assumes a 4-way tensor and
MoE parallel setup and instruct users to either (A) edit tensor_parallel_size
and moe_expert_parallel_size in nemotron-3-super-throughput.yaml to values
compatible with their topology (specifically call out lowering them for 1x B200)
or (B) point to or provide alternative example YAMLs for the 1x B200 and 2x
H100/H200 cases before running trtllm-serve, so users do not blindly copy the
same config for both NVFP4 and BF16 commands.

In `@docs/source/deployment-guide/index.rst`:
- Around line 35-39: The quick-start table row for "Nemotron v3 Super (NVFP4)"
currently lists "GPU: Any" which is inaccurate; update that GPU cell to
"Hopper/Blackwell-class GPUs (sufficient memory)" or similar restrictive wording
so users know they need Hopper/Blackwell-class hardware; ensure the change is
applied to the row containing "Nemotron v3 Super (NVFP4)", the linked config
reference "nemotron-3-super-throughput.yaml", and the example command
"trtllm-serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --config
${TRTLLM_DIR}/examples/configs/curated/nemotron-3-super-throughput.yaml".

In `@docs/source/models/supported-models.md`:
- Line 26: Replace the collection reference in the supported models table with a
concrete model repo so users can load it directly: update the row that lists
`NemotronHForCausalLM` (currently showing `nvidia/nvidia-nemotron-v3`) to a
specific model ID such as `nvidia/nemotron-3-super` or `nvidia/nemotron-3-nano`
so it matches the other entries and is immediately loadable.

In `@examples/models/core/nemotron/README_nemotron_super_v3.md`:
- Line 71: Update the curl example to use a client-reachable host instead of the
server bind address; replace 'http://0.0.0.0:8000/v1/chat/completions' with
'http://localhost:8000/v1/chat/completions' (or the actual server IP) so the
example line in README_nemotron_super_v3.md is copy-pasteable by clients.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 25f037c6-74c1-4545-adc5-e615ae00db05

📥 Commits

Reviewing files that changed from the base of the PR and between bf7142f and c22f5e5.

📒 Files selected for processing (5)
  • docs/source/deployment-guide/deployment-guide-for-nemotron-3-super-on-trtllm.md
  • docs/source/deployment-guide/index.rst
  • docs/source/models/supported-models.md
  • examples/configs/curated/nemotron-3-super-throughput.yaml
  • examples/models/core/nemotron/README_nemotron_super_v3.md

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz nv-guomingz force-pushed the user/guomingz/nemotron_super_doc branch from c22f5e5 to 0dbd6b3 Compare March 12, 2026 00:38
@mdoenv-git
Copy link

Can platform support be added here? Spark for instance and which models/networks are supported and not supported. TRT-LLM for instance is supported, but vLLM and others do not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants