Commit c7ed502
Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial
- Add an NVFP4 PTQ -> QAD -> export section to the Nemotron-3-Nano-30B-A3B
tutorial to recover the NVFP4 accuracy drop, and migrate the existing FP8
quantization section to the examples/megatron_bridge quantize.py / export.py
scripts. Add placeholder rows for the NVFP4 / NVFP4+QAD accuracy and NVFP4
vLLM throughput numbers (to be filled in once the experiments land).
- Wrap all tutorial commands in collapsible <details> blocks.
- Reframe the tutorial as NVFP4 + QAD (instead of FP8) in the root README
"Latest News", CHANGELOG, and the pruning / minitron-vs-puzzletron references.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent 6b73e93 commit c7ed502
8 files changed
Lines changed: 198 additions & 39 deletions
File tree
- examples/pruning
- minitron_vs_puzzletron
- minitron
- NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
- NVIDIA-Nemotron-Nano-9B-v2
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
297 | | - | |
| 297 | + | |
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
| |||
0 commit comments