Skip to content

Commit 6b73e93

Browse files
authored
Nemotron Ultra & Super launcher examples (#1609)
### What does this PR do? Type of change: New example New launcher example for Nemotron Super with PTQ + Export + VLLM smoke test on small GPQA-style dataset ### Usage ```python # Usage: # source .env-slurm # cd tools/launcher # uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml --yes ``` ### Testing <!-- Mention how have you tested your change if applicable. --> ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> - Did you get Claude approval on this PR?: ✅ / ❌ / N/A <!--- Run `/claude review`. NVIDIA org members can self-trigger for complex changes; orthogonal to CodeRabbit. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added checkpoint export capability for quantized models to Hugging Face format. * Introduced complete quantization pipelines with conditional MMLU evaluation and model export stages. * **Bug Fixes** * Fixed num_shards calculation to prevent invalid minimum values. * **Documentation** * Updated vLLM version requirements for optimal NVFP4 model performance. * Enhanced quantization pipeline documentation with improved output paths and conditional execution details. * **Chores** * Updated Megatron-LM module to latest version. * Added sample dataset for model evaluation testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
1 parent 0081473 commit 6b73e93

9 files changed

Lines changed: 289 additions & 21 deletions

File tree

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/bin/bash
2+
3+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
4+
# SPDX-License-Identifier: Apache-2.0
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
19+
source ${SCRIPT_DIR}/../../service_utils.sh
20+
21+
util_install_extra_dep
22+
23+
trap 'error_handler $0 $LINENO' ERR # ERROR HANDLER
24+
###################################################################################################
25+
26+
# Export a quantized MCore checkpoint (saved by quantize.sh) to HF format.
27+
#
28+
# Required env: MLM_MODEL_CFG, QUANT_CFG.
29+
# Optional env:
30+
# MLM_MODEL_CKPT Saved PTQ MCore ckpt path (default: /cicd/megatron-lm/${MLM_MODEL_CFG})
31+
# EXPORT_DIR HF output dir (default: /cicd/export/${MLM_MODEL_CFG}_${QUANT_CFG basename})
32+
# HF_MODEL_CKPT HF source ckpt for tokenizer/config (default: /hf-local/${MLM_MODEL_CFG})
33+
# TP, PP, EP, ETP Parallelism (defaults: 1, 1, 1, 1)
34+
35+
if [[ -z ${MLM_MODEL_CKPT} ]]; then
36+
export MLM_MODEL_CKPT="/cicd/megatron-lm/${MLM_MODEL_CFG}"
37+
fi
38+
if [[ -z ${EXPORT_DIR} ]]; then
39+
# Take basename of QUANT_CFG (strip dirs + .yaml/.yml) so recipe paths
40+
# collapse to a flat tag in EXPORT_DIR.
41+
_QUANT_CFG_TAG="$(basename "${QUANT_CFG}")"
42+
_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yaml}"
43+
_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yml}"
44+
export EXPORT_DIR="/cicd/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
45+
fi
46+
if [[ -z ${HF_MODEL_CKPT} ]]; then
47+
export HF_MODEL_CKPT="/hf-local/${MLM_MODEL_CFG}"
48+
fi
49+
export MLM_SKIP_INSTALL=1
50+
51+
EXPORT_EXE="bash modules/Megatron-LM/examples/post_training/modelopt/export.sh"
52+
53+
export MLM_EXTRA_ARGS=${@}
54+
echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (TP=${TP:-1} PP=${PP:-1} EP=${EP:-1} ETP=${ETP:-1}) ==="
55+
TP=${TP:-1} PP=${PP:-1} EP=${EP:-1} ETP=${ETP:-1} ${EXPORT_EXE} ${MLM_MODEL_CFG}
56+
ls ${EXPORT_DIR}
57+
cat ${EXPORT_DIR}/hf_quant_config.json
58+
59+
###################################################################################################
60+
61+
exit_handler $0

tools/launcher/common/megatron_lm/quantize/quantize.sh

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@
1515
# See the License for the specific language governing permissions and
1616
# limitations under the License.
1717

18+
# Runs Megatron-LM PTQ quantization. Also runs MMLU + HF export inline unless
19+
# RUN_MMLU / RUN_EXPORT are set to "false". Larger models that need different
20+
# parallelism for MMLU/export should set RUN_MMLU=false RUN_EXPORT=false and
21+
# chain the standalone mmlu/mmlu.sh and export/export.sh scripts as separate
22+
# pipeline tasks.
23+
1824
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
1925
source ${SCRIPT_DIR}/../../service_utils.sh
2026

@@ -26,25 +32,35 @@ trap 'error_handler $0 $LINENO' ERR # ERROR HANDLER
2632
if [[ -z ${HF_MODEL_CKPT} ]]; then
2733
export HF_MODEL_CKPT="/hf-local/${MLM_MODEL_CFG}"
2834
fi
29-
export MLM_MODEL_SAVE="/scratchspace/megatron-lm/${MLM_MODEL_CFG}"
30-
export EXPORT_DIR="/scratchspace/export/${MLM_MODEL_CFG}_${QUANT_CFG}"
35+
# Persist PTQ ckpt + HF export under /cicd ($SLURM_JOB_DIR/cicd) so later
36+
# experiments can re-use them.
37+
export MLM_MODEL_SAVE="/cicd/megatron-lm/${MLM_MODEL_CFG}"
38+
# If QUANT_CFG is a recipe path, collapse to a flat tag (strip dirs + .yaml/.yml).
39+
_QUANT_CFG_TAG="$(basename "${QUANT_CFG}")"
40+
_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yaml}"
41+
_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yml}"
42+
export EXPORT_DIR="/cicd/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
3143
export MLM_SKIP_INSTALL=1
3244

3345
QUANTIZE_EXE="bash modules/Megatron-LM/examples/post_training/modelopt/quantize.sh"
3446
MMLU_EXE="bash modules/Megatron-LM/examples/post_training/modelopt/mmlu.sh"
35-
CONVERT_EXE="bash modules/Megatron-LM/examples/post_training/modelopt/convert.sh"
3647
EXPORT_EXE="bash modules/Megatron-LM/examples/post_training/modelopt/export.sh"
3748

49+
# Step 1: quantize
3850
export MLM_EXTRA_ARGS=${@}
3951
TP=${TP:-1} PP=${PP:-1} EP=${EP:-1} ETP=${ETP:-1} ${QUANTIZE_EXE} ${MLM_MODEL_CFG} ${QUANT_CFG}
4052

41-
export MLM_EXTRA_ARGS="--mmlu-dataset ${MMLU_DATASET:-/hf-local/cais/mmlu} --fraction 0.01 --lower-bound ${MMLU_LOWER_BOUND:-0.38} --disable-tqdm"
42-
TP=${TP:-1} PP=${PP:-1} EP=${EP:-1} ETP=${ETP:-1} MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${MMLU_EXE} ${MLM_MODEL_CFG}
53+
# Step 2 (optional): MMLU on the saved PTQ ckpt
54+
if [[ "${RUN_MMLU:-true}" == "true" ]]; then
55+
export MLM_EXTRA_ARGS="--mmlu-dataset ${MMLU_DATASET:-/hf-local/cais/mmlu} --fraction 0.01 --lower-bound ${MMLU_LOWER_BOUND:-0.38} --disable-tqdm"
56+
TP=${TP:-1} PP=${PP:-1} EP=${EP:-1} ETP=${ETP:-1} MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${MMLU_EXE} ${MLM_MODEL_CFG}
57+
fi
4358

44-
# Export quantized checkpoint to HF format
45-
# Use largest PP <= total GPUs that divides the model's num_hidden_layers
46-
TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1})
47-
EXPORT_PP=$(python3 -c "
59+
# Step 3 (optional): export PTQ ckpt to HF format
60+
# Use largest PP <= total GPUs that divides the model's num_hidden_layers.
61+
if [[ "${RUN_EXPORT:-true}" == "true" ]]; then
62+
TOTAL_GPUS=$(python3 -c "import torch; print(torch.cuda.device_count())" 2>/dev/null || echo ${NUM_GPUS:-1})
63+
EXPORT_PP=$(python3 -c "
4864
import json, os
4965
cfg = os.path.join('${HF_MODEL_CKPT}', 'config.json')
5066
n_layers = json.load(open(cfg)).get('num_hidden_layers', 1) if os.path.exists(cfg) else 1
@@ -54,11 +70,12 @@ while pp > 1 and n_layers % pp != 0:
5470
pp -= 1
5571
print(pp)
5672
" 2>/dev/null || echo ${TOTAL_GPUS})
57-
echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ==="
58-
export MLM_EXTRA_ARGS=
59-
TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG}
60-
ls ${EXPORT_DIR}
61-
cat ${EXPORT_DIR}/hf_quant_config.json
73+
echo "=== Exporting ${MLM_MODEL_CFG} ${QUANT_CFG} (PP=${EXPORT_PP}, ${TOTAL_GPUS} GPUs) ==="
74+
export MLM_EXTRA_ARGS=
75+
TP=1 PP=${EXPORT_PP} EP=1 ETP=1 MLM_MODEL_CKPT=${MLM_MODEL_SAVE} ${EXPORT_EXE} ${MLM_MODEL_CFG}
76+
ls ${EXPORT_DIR}
77+
cat ${EXPORT_DIR}/hf_quant_config.json
78+
fi
6279

6380
###################################################################################################
6481

tools/launcher/common/query.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ def synthesize(data):
208208
dataset = load_dataset(args.data, split=args.data_split)
209209

210210
if args.num_shards * 100 > len(dataset):
211-
args.num_shards = min(16, len(dataset) // 100)
211+
args.num_shards = max(1, min(16, len(dataset) // 100))
212212

213213
if args.save is not None:
214214
print(f"Create save dir: {args.save}")
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{"messages": [{"role": "user", "content": "A particle of mass m moves in a one-dimensional infinite square well of width L. What is the energy of the third excited state (n=4) in units of (h^2 / (8 m L^2))?\n\n(A) 4\n(B) 9\n(C) 16\n(D) 25\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
2+
{"messages": [{"role": "user", "content": "In the Diels-Alder reaction between 1,3-butadiene and maleic anhydride, what is the stereochemistry of the major cyclohexene product?\n\n(A) cis-fused, endo\n(B) cis-fused, exo\n(C) trans-fused, endo\n(D) trans-fused, exo\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
3+
{"messages": [{"role": "user", "content": "Which of the following is the correct order of events in eukaryotic translation initiation?\n\n(A) 40S binds mRNA cap; eIF2-GTP-Met-tRNA joins; scan to AUG; 60S joins\n(B) 60S binds mRNA cap; scan to AUG; eIF2-GTP-Met-tRNA joins; 40S joins\n(C) eIF2-GTP-Met-tRNA binds AUG; 40S joins; 60S joins; cap recognized\n(D) 80S forms first; eIF2 delivers Met-tRNA; cap recognized last\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
4+
{"messages": [{"role": "user", "content": "A main-sequence star with a mass of 25 solar masses is expected to end its life as which of the following?\n\n(A) White dwarf\n(B) Brown dwarf\n(C) Neutron star\n(D) Stellar-mass black hole\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
5+
{"messages": [{"role": "user", "content": "An ideal gas undergoes a reversible adiabatic expansion. Which thermodynamic quantity is unchanged?\n\n(A) Internal energy U\n(B) Enthalpy H\n(C) Entropy S\n(D) Gibbs free energy G\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
6+
{"messages": [{"role": "user", "content": "Which spectroscopic technique is most directly used to determine the local chemical environment of carbon atoms in an organic molecule in solution?\n\n(A) X-ray diffraction\n(B) 13C NMR spectroscopy\n(C) UV-Vis absorption\n(D) Mass spectrometry\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
7+
{"messages": [{"role": "user", "content": "In a diploid organism, a gene on the X chromosome with two alleles (A dominant, a recessive) has carrier-mother (X^A X^a) and unaffected-father (X^A Y). What is the probability that a son will be affected (X^a Y)?\n\n(A) 0\n(B) 1/4\n(C) 1/2\n(D) 1\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}
8+
{"messages": [{"role": "user", "content": "A photon has wavelength 500 nm in vacuum. What is its momentum in units of 10^-27 kg m/s? (Use h = 6.626e-34 J s.)\n\n(A) 1.0\n(B) 1.3\n(C) 1.7\n(D) 2.1\n\nReply with the single letter A, B, C, or D, then briefly justify."}]}

tools/launcher/common/vllm/query.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ source ${SCRIPT_DIR}/../service_utils.sh
4141
# vLLM notes:
4242
# - vLLM manages GPU distribution internally; run with ntasks_per_node: 1
4343
# in slurm_config and pass --tensor-parallel-size to match gpus_per_node.
44-
# - NVFP4 models require vllm/vllm-openai:v0.15.0+ on Blackwell GPUs.
44+
# - NVFP4 models require vllm/vllm-openai:v0.21.0+ on Blackwell GPUs.
4545
# - Use --trust-remote-code for models with custom architectures (e.g. Kimi).
4646
#
4747
# In a pipeline YAML task config:

tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Megatron-Bridge import for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16.
22
#
33
# Imports HF weights to a Megatron-LM checkpoint via AutoBridge.import_ckpt
4-
# (use_cpu_initialization=True). Uses a single 8xH100 Slurm node — Megatron-Bridge
4+
# (use_cpu_initialization=True). Uses a single 4-GPU Slurm node — Megatron-Bridge
55
# requires at least 1 GPU for nccl init even with CPU-resident weights.
66
#
77
# Usage:
88
# export SLURM_HOST=<slurm-host>
99
# export SLURM_ACCOUNT=<your-team>
1010
# export SLURM_PARTITION=<gpu-partition> # default: batch
11-
# export SLURM_JOB_DIR=/home/scratch.<user>/experiments
11+
# export SLURM_JOB_DIR=<remote-job-dir>
1212
# export HF_TOKEN=<your-hf-token> # gated model
1313
# cd tools/launcher
1414
# uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml --yes
@@ -17,7 +17,7 @@ job_name: Nemotron-3-Super-120B_bridge_import
1717
pipeline:
1818
skip: false
1919
allow_to_fail: false
20-
note: "HF -> MCore import via Megatron-Bridge (8xH100)"
20+
note: "HF -> MCore import via Megatron-Bridge (1 node x 4 GPUs)"
2121

2222
global_vars:
2323
# /cicd is the experiment_title mount = $SLURM_JOB_DIR/cicd on the host
@@ -37,5 +37,5 @@ pipeline:
3737
partition: batch
3838
nodes: 1
3939
ntasks_per_node: 1
40-
gpus_per_node: 8
40+
gpus_per_node: 4
4141
time: "04:00:00"
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Nemotron-3-Super-120B-A12B-BF16 PTQ quantization + export + vLLM smoke.
2+
# Tested on B200 Blackwell GPUs.
3+
#
4+
# Pipeline:
5+
# task_0 (quantize): 1 node x 4 GPUs = 4 ranks, TP=1 PP=1 EP=4 ETP=1.
6+
# Loads HF weights from /hf-local, saves PTQ ckpt to /cicd.
7+
# task_1 (export): 1 node x 4 GPUs = 4 ranks, TP=1 PP=4 EP=1 ETP=1.
8+
# 88 layers / PP=4 = 22 layers/stage.
9+
# task_2 (smoke): 1 node x 4 GPUs. Serve exported NVFP4 ckpt with vLLM and
10+
# answer 8 GPQA-style questions.
11+
#
12+
# Usage:
13+
# source .env-slurm
14+
# cd tools/launcher
15+
# uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml --yes
16+
17+
job_name: Nemotron-3-Super-120B_PTQ
18+
pipeline:
19+
skip: false
20+
allow_to_fail: false
21+
note: "PTQ on Nemotron-3-Super-120B (super-nvfp4): quantize + export + vLLM smoke, 1 node x 4 GPUs"
22+
23+
task_0:
24+
script: common/megatron_lm/quantize/quantize.sh
25+
args:
26+
- --seq-length 4096 --max-position-embeddings 4096
27+
- --skip-generate
28+
# Fast calibration. Bump (e.g. --calib-size 512) for production.
29+
- --calib-size 32
30+
environment:
31+
- MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
32+
- QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4
33+
- HF_MODEL_CKPT: /hf-local/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
34+
# MMLU + Export run as separate tasks; quantize.sh does quantize only.
35+
- RUN_MMLU: "false"
36+
- RUN_EXPORT: "false"
37+
- TP: "1"
38+
- PP: "1"
39+
- EP: "4"
40+
- ETP: "1"
41+
slurm_config:
42+
_factory_: "slurm_factory"
43+
container: nvcr.io/nvidia/nemo:26.04
44+
modelopt_install_path: /opt/venv/lib/python3.12/site-packages/modelopt
45+
partition: batch
46+
nodes: 1
47+
ntasks_per_node: 4
48+
gpus_per_node: 4
49+
time: "04:00:00"
50+
51+
task_1:
52+
script: common/megatron_lm/export/export.sh
53+
environment:
54+
- MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
55+
- QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4
56+
- HF_MODEL_CKPT: /hf-local/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
57+
- TP: "1"
58+
- PP: "4"
59+
- EP: "1"
60+
- ETP: "1"
61+
slurm_config:
62+
_factory_: "slurm_factory"
63+
container: nvcr.io/nvidia/nemo:26.04
64+
modelopt_install_path: /opt/venv/lib/python3.12/site-packages/modelopt
65+
partition: batch
66+
nodes: 1
67+
ntasks_per_node: 4
68+
gpus_per_node: 4
69+
time: "02:00:00"
70+
71+
# vLLM generation test: serve the exported HF NVFP4 ckpt and answer 8
72+
# GPQA-style questions. Inspect responses under /cicd/vllm/<model>/.
73+
task_2:
74+
script: common/vllm/query.sh
75+
args:
76+
- --model /cicd/export/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16_super-nvfp4
77+
- --tensor-parallel-size 4
78+
- --trust-remote-code
79+
- --
80+
- --data common/vllm/gpqa_sample.jsonl
81+
- --max-tokens 256
82+
- --num-shards 1
83+
- --save /cicd/vllm/NVIDIA-Nemotron-3-Super-120B-A12B-BF16_super-nvfp4
84+
slurm_config:
85+
_factory_: "slurm_factory"
86+
container: vllm/vllm-openai:v0.21.0
87+
partition: batch
88+
nodes: 1
89+
ntasks_per_node: 1
90+
gpus_per_node: 4
91+
time: "01:00:00"
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Nemotron-3-Ultra-550B-A55B-BF16 PTQ quantization + export + vLLM generation test.
2+
# Tested on B200 Blackwell GPUs. Uses Super NVFP4 mixed-FP8 max calibration recipe, similar to published NVFP4 checkpoint (which is Four Over Six scales).
3+
#
4+
# Pipeline:
5+
# task_0 (quantize): 4 nodes x 4 GPUs = 16 ranks, TP=1 PP=1 EP=16 ETP=1.
6+
# Loads HF weights from /hf-local, saves PTQ ckpt to /cicd.
7+
# task_1 (export): 3 nodes x 4 GPUs = 12 ranks, TP=1 PP=12 EP=1 ETP=1.
8+
# 108 layers / PP=12 = 9 layers/stage.
9+
# task_2 (generation test): 1 node x 4 GPUs. Serve exported NVFP4 ckpt with vLLM and
10+
# answer 8 GPQA-style questions.
11+
#
12+
# Usage:
13+
# source .env-slurm
14+
# cd tools/launcher
15+
# uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml --yes
16+
17+
job_name: Nemotron-3-Ultra_PTQ
18+
pipeline:
19+
skip: false
20+
allow_to_fail: false
21+
note: "PTQ on Nemotron-3-Ultra-550B-A55B-BF16 (super-nvfp4-max-calib): quantize @ 4 nodes, export @ 3 nodes, vLLM generation test@ 1 node"
22+
23+
task_0:
24+
script: common/megatron_lm/quantize/quantize.sh
25+
args:
26+
- --seq-length 4096 --max-position-embeddings 4096
27+
- --skip-generate
28+
# Fast calibration. Bump (e.g. --calib-size 512) for production.
29+
- --calib-size 32
30+
environment:
31+
- MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
32+
- QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib
33+
- HF_MODEL_CKPT: /hf-local/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
34+
# MMLU + Export run as separate tasks; quantize.sh does quantize only.
35+
- RUN_MMLU: "false"
36+
- RUN_EXPORT: "false"
37+
- TP: "1"
38+
- PP: "1"
39+
- EP: "16"
40+
- ETP: "1"
41+
slurm_config:
42+
_factory_: "slurm_factory"
43+
container: nvcr.io/nvidia/nemo:26.04
44+
modelopt_install_path: /opt/venv/lib/python3.12/site-packages/modelopt
45+
partition: batch
46+
nodes: 4
47+
ntasks_per_node: 4
48+
gpus_per_node: 4
49+
time: "04:00:00"
50+
51+
task_1:
52+
script: common/megatron_lm/export/export.sh
53+
environment:
54+
- MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
55+
- QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib
56+
- HF_MODEL_CKPT: /hf-local/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
57+
- TP: "1"
58+
- PP: "12"
59+
- EP: "1"
60+
- ETP: "1"
61+
slurm_config:
62+
_factory_: "slurm_factory"
63+
container: nvcr.io/nvidia/nemo:26.04
64+
modelopt_install_path: /opt/venv/lib/python3.12/site-packages/modelopt
65+
partition: batch
66+
nodes: 3
67+
ntasks_per_node: 4
68+
gpus_per_node: 4
69+
time: "02:00:00"
70+
71+
# vLLM generation test: serve the exported HF NVFP4 ckpt and answer 8
72+
# GPQA-style questions. Inspect responses under /cicd/vllm/<model>/.
73+
task_2:
74+
script: common/vllm/query.sh
75+
args:
76+
- --model /cicd/export/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16_super-nvfp4-max-calib
77+
- --tensor-parallel-size 4
78+
- --trust-remote-code
79+
- --
80+
- --data common/vllm/gpqa_sample.jsonl
81+
- --max-tokens 256
82+
- --num-shards 1
83+
- --save /cicd/vllm/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16_super-nvfp4-max-calib
84+
slurm_config:
85+
_factory_: "slurm_factory"
86+
container: vllm/vllm-openai:v0.21.0
87+
partition: batch
88+
nodes: 1
89+
ntasks_per_node: 1
90+
gpus_per_node: 4
91+
time: "01:00:00"

tools/launcher/modules/Megatron-LM

Submodule Megatron-LM updated 686 files

0 commit comments

Comments
 (0)