Add parallel MLA inference tests, fix corresponding issues by janEbert · Pull Request #3787 · NVIDIA/Megatron-LM

janEbert · 2026-03-10T22:19:51Z

MLA used to not be tested for parallel dynamic inference. This PR implements appropriate test parameters into TestDynamicInferenceEngine.test_parallel_inference to test MLA with and without latent caching (although MLA currently only works with latent caching).

The test itself would fail without further changes, so the PR contributes several fixes to the code and improvements like making FlashMLA optional. For example, FlashMLA expects a block size of exactly 64 while FlashAttention expects a block size divisible by 256. In the generative phase, FlashMLA is not used, only FlashAttention, so any inference work other than decode-only tasks would not work with MLA without the changes in this PR.

Since the added tests are smoke-tests only, it's important to also add functional tests in future revisions to ensure correctness of the results.

Finally, tests for Mamba with the inference-optimized Transformer implementation are enabled since they pass without issues.

copy-pr-bot · 2026-03-10T22:19:55Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

janEbert added 7 commits March 10, 2026 23:13

Test MLA use for dynamic inference

7311b93

Test MLA latent caching for dynamic inference

bb1b70d

Fix MLA without QK-norm

8d95aa8

Do not skip inference-optimized Mamba

819af15

Fix method call signature

09cbf69

Make FlashMLA optional

dee356a

Fix sequence parallelism with cached MLA inference

9bba073

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel MLA inference tests, fix corresponding issues#3787

Add parallel MLA inference tests, fix corresponding issues#3787
janEbert wants to merge 7 commits intoNVIDIA:mainfrom
janEbert:parallel-mla-inference-tests

janEbert commented Mar 10, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

janEbert commented Mar 10, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant