Update update upstream#84
Open
AleHD wants to merge 136 commits into
Open
Conversation
Co-authored-by: Chen-Han Yu <chenhany@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Chenhan Yu <chenhany@nvidia.com>
Llama4 inference See merge request ADLR/megatron-lm!3241
…groups from [] to None
Change default value of high_priority_stream_groups from [] to None See merge request ADLR/megatron-lm!3421
…models by padding the routing map.
[feat, moe]: FP8 padding optimization of MoE models by padding the routing map. See merge request ADLR/megatron-lm!3170
Co-authored-by: Robin Zhang <robinz@nvidia.com>
Remove deprecated alltoall_seq dispatcher. See merge request ADLR/megatron-lm!3306
…tary_pos_cos check
Fix flash decode bug caused by unnecessary rotary_pos_cos check See merge request ADLR/megatron-lm!3347
Fix perf issues with NVTX range profiling See merge request ADLR/megatron-lm!3404
… load; raise exception on anomalies
Enforce param group ordering after checkpoint load; raise exception on anomalies See merge request ADLR/megatron-lm!3385
…ng tying Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
[MM] [Bug Fix] model parameter dtype, embedding tying See merge request ADLR/megatron-lm!3399
This reverts commit 629b615.
… embedding tying" This reverts commit 54cdc7a.
…and !3230, and fix typo in doc.
fix(mtp): Fix issue with MTP+VPP after !3108 and !3230, and fix typo in doc. See merge request ADLR/megatron-lm!3332
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Expose TE fused MLP with module spec See merge request ADLR/megatron-lm!3384
Co-authored-by: root <root@gpu-h100-0128.cm.cluster> Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-login-01.cm.cluster>
Moe inference functional tests See merge request ADLR/megatron-lm!3403
…on H100 Co-authored-by: Oliver Koenig <okoenig@cw-dfw-cs-001-login-01.cm.cluster>
ci: Benchmark release tests suite with TE2.2 on H100 See merge request ADLR/megatron-lm!3458
Move data to GPU for TP data processing See merge request ADLR/megatron-lm!3371
…, embedding tying" This reverts commit 5ae21f8.
Revert `fork` to `spawn` based on stability issues in checkpointing See merge request ADLR/megatron-lm!3450
…able quantization configuration Co-authored-by: Simon Layton <slayton@nvidia.com>
Add kitchen extension with per-layer configurable quantization configuration See merge request ADLR/megatron-lm!3301
Add deprecation warning for legacy inference See merge request ADLR/megatron-lm!3474
…ings to avoid conflicts
Change naming of original_max_position_embeddings to avoid conflicts See merge request ADLR/megatron-lm!3181
…when it fails arg checks
…main' Make cudagraph replay check more descriptive when it fails arg checks See merge request ADLR/megatron-lm!3472
…der tests in CI for MCore Encoder Refactoring Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
…o 'main' M4 Taskforce: Disable T5 and encoder_and_decoder tests in CI for MCore Encoder Refactoring See merge request ADLR/megatron-lm!3414
…s like 'pre_wd_mult' instead of 'wd_mult'
Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult' See merge request ADLR/megatron-lm!3444
chore: Bump version 0.14.0 See merge request ADLR/megatron-lm!3477
Co-authored-by: Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com>
Added offloading support for MCore layers See merge request ADLR/megatron-lm!3071
… avoid shuffling of new tokens Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens See merge request ADLR/megatron-lm!3437
chore: Add init to tools See merge request ADLR/megatron-lm!3483
Fix unit test test_fp8_param.py blockwise scaling See merge request ADLR/megatron-lm!3480
chore: Add init to examples See merge request ADLR/megatron-lm!3492
build: Force pin down setuptools See merge request ADLR/megatron-lm!3493
Pad input tensors and enable fp8 weights for fp8 inference See merge request ADLR/megatron-lm!3341
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Upstreamception