Update update upstream by AleHD · Pull Request #84 · swiss-ai/Megatron-LM

AleHD · 2025-06-25T12:26:23Z

Upstreamception

Co-authored-by: Chen-Han Yu <chenhany@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Chenhan Yu <chenhany@nvidia.com>

Llama4 inference See merge request ADLR/megatron-lm!3241

…groups from [] to None

Change default value of high_priority_stream_groups from [] to None See merge request ADLR/megatron-lm!3421

…models by padding the routing map.

[feat, moe]: FP8 padding optimization of MoE models by padding the routing map. See merge request ADLR/megatron-lm!3170

Co-authored-by: Robin Zhang <robinz@nvidia.com>

Remove deprecated alltoall_seq dispatcher. See merge request ADLR/megatron-lm!3306

…tary_pos_cos check

Fix flash decode bug caused by unnecessary rotary_pos_cos check See merge request ADLR/megatron-lm!3347

Fix perf issues with NVTX range profiling See merge request ADLR/megatron-lm!3404

… load; raise exception on anomalies

Enforce param group ordering after checkpoint load; raise exception on anomalies See merge request ADLR/megatron-lm!3385

…ng tying Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

[MM] [Bug Fix] model parameter dtype, embedding tying See merge request ADLR/megatron-lm!3399

This reverts commit d1409db, reversing changes made to 54cdc7a.

This reverts commit 629b615.

… embedding tying" This reverts commit 54cdc7a.

…and !3230, and fix typo in doc.

fix(mtp): Fix issue with MTP+VPP after !3108 and !3230, and fix typo in doc. See merge request ADLR/megatron-lm!3332

Co-authored-by: Michal Futrega <mfutrega@nvidia.com>

Expose TE fused MLP with module spec See merge request ADLR/megatron-lm!3384

Co-authored-by: root <root@gpu-h100-0128.cm.cluster> Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-login-01.cm.cluster>

Moe inference functional tests See merge request ADLR/megatron-lm!3403

…on H100 Co-authored-by: Oliver Koenig <okoenig@cw-dfw-cs-001-login-01.cm.cluster>

ci: Benchmark release tests suite with TE2.2 on H100 See merge request ADLR/megatron-lm!3458

Move data to GPU for TP data processing See merge request ADLR/megatron-lm!3371

…, embedding tying" This reverts commit 5ae21f8.

Revert `fork` to `spawn` based on stability issues in checkpointing See merge request ADLR/megatron-lm!3450

…able quantization configuration Co-authored-by: Simon Layton <slayton@nvidia.com>

Add kitchen extension with per-layer configurable quantization configuration See merge request ADLR/megatron-lm!3301

Add deprecation warning for legacy inference See merge request ADLR/megatron-lm!3474

…ings to avoid conflicts

Change naming of original_max_position_embeddings to avoid conflicts See merge request ADLR/megatron-lm!3181

…when it fails arg checks

…main' Make cudagraph replay check more descriptive when it fails arg checks See merge request ADLR/megatron-lm!3472

…der tests in CI for MCore Encoder Refactoring Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

…o 'main' M4 Taskforce: Disable T5 and encoder_and_decoder tests in CI for MCore Encoder Refactoring See merge request ADLR/megatron-lm!3414

…s like 'pre_wd_mult' instead of 'wd_mult'

Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult' See merge request ADLR/megatron-lm!3444

chore: Bump version 0.14.0 See merge request ADLR/megatron-lm!3477

Co-authored-by: Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com>

Added offloading support for MCore layers See merge request ADLR/megatron-lm!3071

… avoid shuffling of new tokens Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>

Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens See merge request ADLR/megatron-lm!3437

chore: Add init to tools See merge request ADLR/megatron-lm!3483

…ling

Fix unit test test_fp8_param.py blockwise scaling See merge request ADLR/megatron-lm!3480

chore: Add init to examples See merge request ADLR/megatron-lm!3492

build: Force pin down setuptools See merge request ADLR/megatron-lm!3493

…fp8 inference

Pad input tensors and enable fp8 weights for fp8 inference See merge request ADLR/megatron-lm!3341

wdykas and others added 30 commits June 5, 2025 17:18

ADLR/megatron-lm!3241 - Llama4 inference

7af72f9

Co-authored-by: Chen-Han Yu <chenhany@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Chenhan Yu <chenhany@nvidia.com>

Merge branch 'llama4-inference' into 'main'

4eb36f8

Llama4 inference See merge request ADLR/megatron-lm!3241

ADLR/megatron-lm!3421 - Change default value of high_priority_stream_…

61a42f6

…groups from [] to None

Merge branch 'comm-priority-patch' into 'main'

7c64be3

Change default value of high_priority_stream_groups from [] to None See merge request ADLR/megatron-lm!3421

ADLR/megatron-lm!3170 - [feat, moe]: FP8 padding optimization of MoE …

92d68da

…models by padding the routing map.

Merge branch 'denliu/router_pad' into 'main'

140dce2

[feat, moe]: FP8 padding optimization of MoE models by padding the routing map. See merge request ADLR/megatron-lm!3170

ADLR/megatron-lm!3306 - Remove deprecated alltoall_seq dispatcher.

9e3adb5

Co-authored-by: Robin Zhang <robinz@nvidia.com>

Merge branch 'denliu/remove_alltoall_seq_dispatcher' into 'main'

823466e

Remove deprecated alltoall_seq dispatcher. See merge request ADLR/megatron-lm!3306

ADLR/megatron-lm!3347 - Fix flash decode bug caused by unnecessary ro…

db07e3f

…tary_pos_cos check

Merge branch 'hybrid_example' into 'main'

2e15d12

Fix flash decode bug caused by unnecessary rotary_pos_cos check See merge request ADLR/megatron-lm!3347

ADLR/megatron-lm!3404 - Fix perf issues with NVTX range profiling

1589517

Merge branch 'nvtx_perf_fix' into 'main'

b04c901

Fix perf issues with NVTX range profiling See merge request ADLR/megatron-lm!3404

ADLR/megatron-lm!3385 - Enforce param group ordering after checkpoint…

791454d

… load; raise exception on anomalies

Merge branch 'skierat/fix_param_groups' into 'main'

40cb6e7

Enforce param group ordering after checkpoint load; raise exception on anomalies See merge request ADLR/megatron-lm!3385

ADLR/megatron-lm!3399 - [MM] [Bug Fix] model parameter dtype, embeddi…

54cdc7a

…ng tying Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

Merge branch 'chcui/llama-nemotron-nano-vl-8b' into 'main'

d1409db

[MM] [Bug Fix] model parameter dtype, embedding tying See merge request ADLR/megatron-lm!3399

Revert "Merge branch 'chcui/llama-nemotron-nano-vl-8b' into 'main'"

629b615

This reverts commit d1409db, reversing changes made to 54cdc7a.

Reapply "Merge branch 'chcui/llama-nemotron-nano-vl-8b' into 'main'"

50a1247

This reverts commit 629b615.

Revert "ADLR/megatron-lm!3399 - [MM] [Bug Fix] model parameter dtype,…

5ae21f8

… embedding tying" This reverts commit 54cdc7a.

ADLR/megatron-lm!3332 - fix(mtp): Fix issue with MTP+VPP after !3108 …

62e7e60

…and !3230, and fix typo in doc.

Merge branch 'shifang/fix_vp_stage' into 'main'

ad36348

fix(mtp): Fix issue with MTP+VPP after !3108 and !3230, and fix typo in doc. See merge request ADLR/megatron-lm!3332

ADLR/megatron-lm!3384 - Expose TE fused MLP with module spec

0f4f095

Co-authored-by: Michal Futrega <mfutrega@nvidia.com>

Merge branch 'mfutrega/fused_swiglu' into 'main'

0595ef2

Expose TE fused MLP with module spec See merge request ADLR/megatron-lm!3384

ADLR/megatron-lm!3403 - Moe inference functional tests

9e5fe7a

Co-authored-by: root <root@gpu-h100-0128.cm.cluster> Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-login-01.cm.cluster>

Merge branch 'moe-tests' into 'main'

0dea9a5

Moe inference functional tests See merge request ADLR/megatron-lm!3403

ADLR/megatron-lm!3458 - ci: Benchmark release tests suite with TE2.2 …

80d66ec

…on H100 Co-authored-by: Oliver Koenig <okoenig@cw-dfw-cs-001-login-01.cm.cluster>

Merge branch 'ko3n1g/chore/release-benchmarks-dev' into 'main'

a3e2222

ci: Benchmark release tests suite with TE2.2 on H100 See merge request ADLR/megatron-lm!3458

ADLR/megatron-lm!3371 - Move data to GPU for TP data processing

15e4446

Merge branch 'pmannan/improve_data_processing' into 'main'

d58f062

Move data to GPU for TP data processing See merge request ADLR/megatron-lm!3371

Reapply "ADLR/megatron-lm!3399 - [MM] [Bug Fix] model parameter dtype…

f5cfc10

…, embedding tying" This reverts commit 5ae21f8.

jaredcasper and others added 29 commits June 16, 2025 15:37

Merge branch 'sbak/ckpt_manager_fix' into 'main'

c8f2f56

Revert `fork` to `spawn` based on stability issues in checkpointing See merge request ADLR/megatron-lm!3450

ADLR/megatron-lm!3301 - Add kitchen extension with per-layer configur…

f7e4641

…able quantization configuration Co-authored-by: Simon Layton <slayton@nvidia.com>

Merge branch 'kwyss/megatron_kitchen_extension' into 'main'

8c15450

Add kitchen extension with per-layer configurable quantization configuration See merge request ADLR/megatron-lm!3301

ADLR/megatron-lm!3474 - Add deprecation warning for legacy inference

1e8e9a4

Merge branch 'legacy_deprecation_warning' into 'main'

b87f147

Add deprecation warning for legacy inference See merge request ADLR/megatron-lm!3474

ADLR/megatron-lm!3181 - Change naming of original_max_position_embedd…

ab77e52

…ings to avoid conflicts

Merge branch 'boxiangw/mla-yarn-change-option-name' into 'main'

2386c6c

Change naming of original_max_position_embeddings to avoid conflicts See merge request ADLR/megatron-lm!3181

ADLR/megatron-lm!3472 - Make cudagraph replay check more descriptive …

fee5600

…when it fails arg checks

Merge branch 'helenn-flag-specific-error-for-cudagraph-replay' into '…

c3dc507

…main' Make cudagraph replay check more descriptive when it fails arg checks See merge request ADLR/megatron-lm!3472

ADLR/megatron-lm!3414 - M4 Taskforce: Disable T5 and encoder_and_deco…

db70ed4

…der tests in CI for MCore Encoder Refactoring Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>

Merge branch 'yuya/m4_remove_encoder_pp_tests_ci_add_deprecation' int…

5615930

…o 'main' M4 Taskforce: Disable T5 and encoder_and_decoder tests in CI for MCore Encoder Refactoring See merge request ADLR/megatron-lm!3414

ADLR/megatron-lm!3444 - Quick fix for NeMo: handle alternate key name…

e0b2c60

…s like 'pre_wd_mult' instead of 'wd_mult'

Merge branch 'skierat/quick_nemo_fix' into 'main'

bfa39e8

Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult' See merge request ADLR/megatron-lm!3444

ADLR/megatron-lm!3477 - chore: Bump version 0.14.0

0e3af7e

Merge branch 'ko3n1g/chore/release-version-0.14.0' into 'main'

27c9b6c

chore: Bump version 0.14.0 See merge request ADLR/megatron-lm!3477

ADLR/megatron-lm!3071 - Added offloading support for MCore layers

3987e89

Co-authored-by: Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com>

Merge branch 'lora_offload' into 'main'

4a91173

Added offloading support for MCore layers See merge request ADLR/megatron-lm!3071

Merge branch 'bugFixDE' into 'main'

3b0f763

Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens See merge request ADLR/megatron-lm!3437

ADLR/megatron-lm!3483 - chore: Add init to tools

642a181

Merge branch 'ko3n1g/chore/tool-init' into 'main'

0710137

chore: Add init to tools See merge request ADLR/megatron-lm!3483

ADLR/megatron-lm!3480 - Fix unit test test_fp8_param.py blockwise sca…

171c351

…ling

Merge branch 'fix_2425' into 'main'

57082f9

Fix unit test test_fp8_param.py blockwise scaling See merge request ADLR/megatron-lm!3480

ADLR/megatron-lm!3492 - chore: Add init to examples

9f1c4b2

Merge branch 'ko3n1g/chore/examples-init' into 'main'

6ac5633

chore: Add init to examples See merge request ADLR/megatron-lm!3492

ADLR/megatron-lm!3493 - build: Force pin down setuptools

2074d19

Merge branch 'ko3n1g/build/fix-setuptools-version' into 'main'

0600a3c

build: Force pin down setuptools See merge request ADLR/megatron-lm!3493

ADLR/megatron-lm!3341 - Pad input tensors and enable fp8 weights for …

a002d50

…fp8 inference

Merge branch 'fp8_inference' into 'main'

6a6cd47

Pad input tensors and enable fp8 weights for fp8 inference See merge request ADLR/megatron-lm!3341

AleHD self-assigned this Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update update upstream#84

Update update upstream#84
AleHD wants to merge 136 commits into
fp8from
newupstream

AleHD commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

AleHD commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants