Skip to content

Update update upstream#84

Open
AleHD wants to merge 136 commits into
fp8from
newupstream
Open

Update update upstream#84
AleHD wants to merge 136 commits into
fp8from
newupstream

Conversation

@AleHD
Copy link
Copy Markdown
Collaborator

@AleHD AleHD commented Jun 25, 2025

Upstreamception

wdykas and others added 30 commits June 5, 2025 17:18
Co-authored-by: Chen-Han Yu <chenhany@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: Chenhan Yu <chenhany@nvidia.com>
Llama4 inference

See merge request ADLR/megatron-lm!3241
Change default value of high_priority_stream_groups from [] to None

See merge request ADLR/megatron-lm!3421
[feat, moe]: FP8 padding optimization of MoE models by padding the routing map.

See merge request ADLR/megatron-lm!3170
Co-authored-by: Robin Zhang <robinz@nvidia.com>
Remove deprecated alltoall_seq dispatcher.

See merge request ADLR/megatron-lm!3306
Fix flash decode bug caused by unnecessary rotary_pos_cos check

See merge request ADLR/megatron-lm!3347
Fix perf issues with NVTX range profiling

See merge request ADLR/megatron-lm!3404
Enforce param group ordering after checkpoint load; raise exception on anomalies

See merge request ADLR/megatron-lm!3385
…ng tying

Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
[MM] [Bug Fix] model parameter dtype, embedding tying

See merge request ADLR/megatron-lm!3399
fix(mtp): Fix issue with MTP+VPP after !3108 and !3230, and fix typo in doc.

See merge request ADLR/megatron-lm!3332
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Expose TE fused MLP with module spec

See merge request ADLR/megatron-lm!3384
Co-authored-by: root <root@gpu-h100-0128.cm.cluster>
Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-login-01.cm.cluster>
Moe inference functional tests

See merge request ADLR/megatron-lm!3403
…on H100

Co-authored-by: Oliver Koenig <okoenig@cw-dfw-cs-001-login-01.cm.cluster>
ci: Benchmark release tests suite with TE2.2 on H100

See merge request ADLR/megatron-lm!3458
Move data to GPU for TP data processing

See merge request ADLR/megatron-lm!3371
jaredcasper and others added 29 commits June 16, 2025 15:37
Revert `fork` to `spawn` based on stability issues in checkpointing

See merge request ADLR/megatron-lm!3450
…able quantization configuration

Co-authored-by: Simon Layton <slayton@nvidia.com>
Add kitchen extension with per-layer configurable quantization configuration

See merge request ADLR/megatron-lm!3301
Add deprecation warning for legacy inference

See merge request ADLR/megatron-lm!3474
Change naming of original_max_position_embeddings to avoid conflicts

See merge request ADLR/megatron-lm!3181
…main'

Make cudagraph replay check more descriptive when it fails arg checks

See merge request ADLR/megatron-lm!3472
…der tests in CI for MCore Encoder Refactoring

Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
…o 'main'

M4 Taskforce: Disable T5 and encoder_and_decoder tests in CI for MCore Encoder Refactoring

See merge request ADLR/megatron-lm!3414
Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult'

See merge request ADLR/megatron-lm!3444
chore: Bump version 0.14.0

See merge request ADLR/megatron-lm!3477
Co-authored-by: Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com>
Added offloading support for MCore layers

See merge request ADLR/megatron-lm!3071
… avoid shuffling of new tokens

Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens

See merge request ADLR/megatron-lm!3437
chore: Add init to tools

See merge request ADLR/megatron-lm!3483
Fix unit test test_fp8_param.py blockwise scaling

See merge request ADLR/megatron-lm!3480
chore: Add init to examples

See merge request ADLR/megatron-lm!3492
build: Force pin down setuptools

See merge request ADLR/megatron-lm!3493
Pad input tensors and enable fp8 weights for fp8 inference

See merge request ADLR/megatron-lm!3341
@AleHD AleHD self-assigned this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.