Skip to content

Commit f905b6e

Browse files
Fixes based on Nemotron3 tests
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1 parent 7ad01c0 commit f905b6e

8 files changed

Lines changed: 47 additions & 34 deletions

File tree

.github/workflows/example_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ jobs:
8686
uses: ./.github/workflows/_example_tests_runner.yml
8787
secrets: inherit
8888
with:
89-
docker_image: "nvcr.io/nvidia/nemo:26.02"
89+
docker_image: "nvcr.io/nvidia/nemo:26.04"
9090
example: megatron_bridge
9191
timeout_minutes: 30
9292
pip_install_extras: "[hf,puzzletron,dev-test]"

examples/megatron_bridge/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This directory contains examples of using Model Optimizer with [NeMo Megatron-Br
1616

1717
## Pre-Requisites
1818

19-
Running these examples requires many additional dependencies to be installed (e.g., Megatron-Bridge, Megatron-core, etc.), hence we strongly recommend directly using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`) which has all the dependencies installed.
19+
Running these examples requires many additional dependencies to be installed (e.g., Megatron-Bridge, Megatron-core, etc.), hence we strongly recommend directly using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.04`) which has all the dependencies installed.
2020

2121
To get the ModelOpt examples scripts, mount your Model-Optimizer repo to the container as follows:
2222

@@ -26,7 +26,7 @@ if [ ! -d "${MODELOPT_DIR}" ]; then
2626
git clone https://github.com/NVIDIA/Model-Optimizer.git ${MODELOPT_DIR}
2727
fi
2828

29-
export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
29+
export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.04
3030
docker run \
3131
--gpus all \
3232
--shm-size=16GB \
@@ -49,6 +49,12 @@ hf auth login --token <your token>
4949
> [!WARNING]
5050
> Use `python -m pip` instead of `pip` to avoid conflicts with the system-wide installed packages in the NeMo containers. You may also refer to this [doc](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/docker/common/README.md#installing-packages-inside-the-container) on how to correctly install packages in the NeMo containers without breaking existing torch installation.
5151
52+
Also install additional dependencies from the [requirements.txt](./requirements.txt) file.
53+
54+
```bash
55+
python -m pip install -r requirements.txt
56+
```
57+
5258
## Pruning
5359

5460
This section shows how to prune a HuggingFace model using Minitron algorithm in Megatron-Bridge framework. Checkout other available pruning algorithms, supported frameworks and models, and general pruning getting-started in the [pruning README](../pruning/README.md).

examples/megatron_bridge/prune_minitron.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,11 +161,11 @@ def get_args() -> argparse.Namespace:
161161
parser.add_argument(
162162
"--prune_score_func",
163163
type=str,
164-
default="mmlu_10pct",
164+
default="mmlu_10pct_bs1",
165165
help=(
166166
"Score function to use for NAS-based pruning. Only supports MMLU at the moment. "
167-
"Format: mmlu_<N>pct where <N> is the percentage of MMLU data to sample per subject "
168-
"(e.g. mmlu_10pct for 10%, mmlu_100pct for full eval)."
167+
"Format: mmlu_<N>pct_<bs> where <N> is the percentage of MMLU data to sample per subject and <bs> is "
168+
"batch size for fast evaluation (default is mmlu_10pct_bs1)."
169169
),
170170
)
171171
parser.add_argument(
@@ -343,16 +343,17 @@ def main(args: argparse.Namespace):
343343
"You can change this to be any other metric you want to maximize (e.g. negative validation loss)."
344344
)
345345

346-
match = re.fullmatch(r"mmlu_(\d+)pct", args.prune_score_func)
346+
match = re.fullmatch(r"mmlu_(\d+)pct_bs(\d+)", args.prune_score_func)
347347
if not match:
348348
raise ValueError(
349-
f"Invalid score function: {args.prune_score_func}. Expected format: mmlu_<N>pct (e.g. mmlu_10pct)"
349+
f"Invalid score function: {args.prune_score_func}. Expected format: mmlu_<N>pct_bs<bs>"
350350
)
351351
mmlu_frac = float(match.group(1)) / 100.0
352+
batch_size = int(match.group(2))
352353

353354
def score_func(m):
354355
return megatron_mmlu(
355-
m, tokenizer, few_shots=0, fraction=mmlu_frac, batch_size=args.calib_mbs
356+
m, tokenizer, few_shots=0, fraction=mmlu_frac, batch_size=batch_size
356357
)
357358

358359
pruning_config["score_func"] = score_func
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Saving some pruned models (e.g. Nemotron-3-Nano-30B-A3B-BF16) have issues with transformers>=5.0
2+
transformers<5.0

examples/pruning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section focuses on applying Model Optimizer's state-of-the-art complementar
2727

2828
## Pre-Requisites
2929

30-
For Minitron pruning for Megatron-Bridge / Megatron-LM models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`) which has all the dependencies installed.
30+
For Minitron pruning for Megatron-Bridge / Megatron-LM models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.04`) which has all the dependencies installed.
3131

3232
For FastNAS pruning for PyTorch Computer Vision models, no additional dependencies are required.
3333

modelopt/torch/prune/plugins/mcore_minitron.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def _rprint(*renderables: Any) -> None:
190190

191191
# Constraint keys that trigger the grid-search path in MCoreMinitronSearcher.
192192
# Order defines priority: first active key is used as the primary display/sort metric.
193-
_METRIC_CONSTRAINT_PRIORITY = ("params", "active_params", "memory_mb")
193+
_METRIC_CONSTRAINT_PRIORITY = ("active_params", "params", "memory_mb")
194194
_METRIC_CONSTRAINTS = frozenset(_METRIC_CONSTRAINT_PRIORITY)
195195

196196

@@ -524,15 +524,15 @@ def search_best_arch_by_metrics(self) -> dict:
524524
_rprint(table)
525525

526526
# 3. Optional Knowledge Distillation (KD) step for all top-k candidates
527-
print_rank_0(
528-
"\nSkipping optional Knowledge Distillation (KD) step for candidates as it is a manual step. "
527+
_rprint(
528+
f"[yellow]\nSkipping optional Knowledge Distillation (KD) step for candidates as it is a manual step. "
529529
"As per the original paper (https://arxiv.org/pdf/2407.14679), ideally we need to perform a short "
530530
f"Knowledge Distillation on ~2B tokens for all top {top_k} candidates before evaluating the "
531531
"`score_func`, which will take a lot longer to prune, require splitting the pruning process into multiple "
532532
"stages and a lot more compute for pruning but can lead to better pruned model selection. If you are "
533533
f"interested to do this, you can take the top {top_k} candidates' `export_config` from the logs above and "
534534
"then export all models separately and perform Knowledge Distillation on each of them before evaluating "
535-
"the `score_func`.\n"
535+
f"the `score_func`.\n[/yellow]"
536536
)
537537

538538
# 4. Validate top-k candidates using the score_func and return the best subnet
@@ -683,9 +683,6 @@ def _generate_search_space_combos(
683683
def _compute_candidate_metrics(self, ss_config: dict, max_num_layers: int) -> dict[str, float]:
684684
"""Compute all active metric constraint values for a candidate config analytically.
685685
686-
Calls ``mcore_param_count`` at most once (covers both ``params`` and ``active_params``)
687-
and ``mcore_memory_footprint_mb`` at most once (for ``memory_mb``).
688-
Replaces the slow ``_prune → _param_num_dynamic → sample(max)`` loop used during search.
689686
Handles depth pruning by filtering the hybrid layer pattern to the kept (best) layers.
690687
"""
691688
model = self.model

tests/examples/megatron_bridge/test_prune_minitron.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ def test_prune_minitron(tmp_path: Path, num_gpus):
3838
calib_num_samples=16,
3939
seq_length=32,
4040
prune_target_params=prune_target_params,
41-
prune_score_func="mmlu_1pct",
41+
prune_score_func="mmlu_1pct_bs32",
4242
ss_channel_divisor=4,
4343
hparams_to_skip="num_attention_heads",
4444
top_k=1,

tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -321,7 +321,12 @@ def _assert_top_k_candidates(searcher_state, constraint_key, expected_top_k, k=1
321321
assert len(top_k) == k
322322
for actual, (ss_config, metrics, score) in zip(top_k, expected_top_k):
323323
assert actual.ss_config == ss_config, (actual.ss_config, ss_config)
324-
assert actual.metrics == metrics, (actual.metrics, metrics)
324+
for metric_name, expected_value in metrics.items():
325+
actual_value = actual.metrics[metric_name]
326+
if isinstance(expected_value, float):
327+
assert actual_value == pytest.approx(expected_value), (actual.metrics, metrics)
328+
else:
329+
assert actual_value == expected_value, (actual.metrics, metrics)
325330
assert actual.score == score, (actual.score, score)
326331

327332

@@ -338,7 +343,7 @@ def _test_mcore_mamba_hybrid_pruning_nas_params(rank, size, ckpt_dir):
338343
assert baseline_params == 14984, baseline_params
339344
constraints = {
340345
"params": int(baseline_params * 0.5),
341-
"active_params": int(baseline_active * 0.7),
346+
"active_params": int(baseline_active * 0.55),
342347
}
343348

344349
# Capture stdout to assert search space output
@@ -373,8 +378,8 @@ def assert_row(key: str, value: str) -> None:
373378
model.share_embeddings_and_output_weights,
374379
hybrid_layer_pattern=_get_hybrid_layer_pattern(model),
375380
)
376-
assert pruned_params == 7154, pruned_params
377-
assert pruned_active_params == 7154, pruned_active_params
381+
assert pruned_params == 6536, pruned_params
382+
assert pruned_active_params == 6536, pruned_active_params
378383

379384
# NOTE: Slight variation in layer ordering for MoE / Attention / MLP depending on PP configuration
380385
# This affects param counts when num_layers is pruned
@@ -384,26 +389,28 @@ def assert_row(key: str, value: str) -> None:
384389
# Winner is 3-layer: keeps layers [1,4,3] from "ME*-" → drops 'E' (layer 2) → "M*-"
385390
assert _get_hybrid_layer_pattern(model) == "M*-", _get_hybrid_layer_pattern(model)
386391
expected_top_k = [
387-
# 4 four-layer models qualifying under params_thresh=7492
388-
[{"num_layers": 4, "hidden_size": 12, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 6, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 20}, {"params": 7418, "active_params": 6266}, 104], # noqa: E501
392+
# position 1: the one qualifying 4-layer model (active=6542 > 3-layer H=12 active),
393+
# demonstrating that active_params-first ranking can elevate 4-layer above 3-layer models
389394
[{"num_layers": 4, "hidden_size": 12, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 7406, "active_params": 6542}, 115], # noqa: E501
390-
[{"num_layers": 4, "hidden_size": 12, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 28}, {"params": 7310, "active_params": 6446}, 111], # noqa: E501
391-
[{"num_layers": 4, "hidden_size": 12, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 24}, {"params": 7214, "active_params": 6350}, 107], # noqa: E501
392-
# 6 depth-pruned (num_layers=3) models; params==active_params since MoE layer is dropped
393-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 118], # noqa: E501
394-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 122], # noqa: E501
395-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 6, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 119], # noqa: E501
396-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 6, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 123], # noqa: E501
397-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 7, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 120], # noqa: E501
398-
[{"num_layers": 3, "hidden_size": 16, "mamba_num_heads": 6, "mamba_head_dim": 12, "num_moe_experts": 7, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 7154, "active_params": 7154}, 124], # noqa: E501
395+
# positions 2-9: 3-layer H=12 MNH=8 MHD=12 ffn=32 (active==params=6536, no MoE layer)
396+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 116], # noqa: E501
397+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 5, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 120], # noqa: E501
398+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 6, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 117], # noqa: E501
399+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 6, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 121], # noqa: E501
400+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 7, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 118], # noqa: E501
401+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 7, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 122], # noqa: E501
402+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 8, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 119], # noqa: E501
403+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 8, "mamba_head_dim": 12, "num_moe_experts": 8, "moe_ffn_hidden_size": 16, "ffn_hidden_size": 32}, {"params": 6536, "active_params": 6536}, 123], # noqa: E501
404+
# position 10: first 3-layer H=12 MNH=6 MHD=16 ffn=32 candidate (active=6506)
405+
[{"num_layers": 3, "hidden_size": 12, "mamba_num_heads": 6, "mamba_head_dim": 16, "num_moe_experts": 5, "moe_ffn_hidden_size": 12, "ffn_hidden_size": 32}, {"params": 6506, "active_params": 6506}, 118], # noqa: E501
399406
]
400407
else:
401408
raise RuntimeError(f"FIXME: Non deterministic test, assertions may fail: {sorted_layers=}")
402409
# fmt: on
403410

404411
_assert_top_k_candidates(
405412
searcher_state,
406-
(("params", constraints["params"]), ("active_params", constraints["active_params"])),
413+
(("active_params", constraints["active_params"]), ("params", constraints["params"])),
407414
expected_top_k,
408415
)
409416
run_mcore_inference_with_dummy_input(model, _NAS_BATCH_SIZE, model.config.hidden_size)

0 commit comments

Comments
 (0)