Draft(M-FSDP): Demonstrate the fix for Megatron-Bridge by shjwudp · Pull Request #3161 · NVIDIA/Megatron-LM

shjwudp · 2026-01-30T02:44:12Z

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-01-30T02:44:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…tensor

…on-LM into mfsdp_fix_mcore_tp_detect

…dp_fix_mcore_tp

shjwudp · 2026-03-13T05:51:34Z

/claude review

claude · 2026-03-13T05:53:54Z

megatron/core/distributed/fsdp/src/megatron_fsdp/uneven_dtensor.py

+    debug_slices = []
+    for chunk_info, local_chunk in zip(local_chunks_info, all_local_chunks):
+        offset = chunk_info["offset"]
+        slices = tuple(slice(o, o + s) for o, s in zip(offset, local_chunk.shape))
+        debug_slices.append(slices)
+


Leftover debug code: debug_slices is computed but never used. Should be removed.

Suggested change

debug_slices = []

for chunk_info, local_chunk in zip(local_chunks_info, all_local_chunks):

offset = chunk_info["offset"]

slices = tuple(slice(o, o + s) for o, s in zip(offset, local_chunk.shape))

debug_slices.append(slices)

# Reconstruct the full tensor by placing chunks at their correct offsets

claude · 2026-03-13T05:53:56Z

megatron/core/distributed/fsdp/src/megatron_fsdp/uneven_dtensor.py

+        >>> mesh = DeviceMesh("cuda", [0, 1, 2, 3])
+        >>> # Create a DTensor with uneven sharding
+        >>> dtensor = DTensor(..., placements=[Shard(0)])
+        >>> full_tensor = gather_uneven_dtensor_to_full_tensor(dtensor)


Docstring example still references the old function name gather_uneven_dtensor_to_full_tensor — should be uneven_dtensor_to_full_tensor.

Suggested change

>>> full_tensor = gather_uneven_dtensor_to_full_tensor(dtensor)

>>> full_tensor = uneven_dtensor_to_full_tensor(dtensor)

claude · 2026-03-13T05:53:57Z

megatron/core/distributed/fsdp/src/megatron_fsdp/utils.py

 def is_mcore_tensor_parallel_duplicated(param: torch.Tensor) -> bool:
    """
    Check if the given parameter is Megatron-Core tensor model parallel and duplicated.
    """
-    return getattr(param, "_tp_duplicated", False)
+    return get_mcore_tensor_parallel_partition_dim(param) is None


Bug: is_mcore_tensor_parallel_duplicated now returns True for any parameter without _tensor_parallel_mode set (including completely unannotated parameters). Combined with the change in param_and_grad_buffer.py where the guard switched from is_mcore_tensor_model_parallel(param) (per-param check) to using_tensor_parallel(dist_index) (global mesh check), any unannotated parameter will now be wrapped as Replicate() DTensor when TP is enabled.

If _detect_parallelism_type returns None for a module (e.g. a custom user module), its params won't get _tensor_parallel_mode set, but they'll still enter the TP DTensor path and be treated as duplicated. This may be the intended behavior, but the docstring is misleading — this function no longer checks if the param "is tensor model parallel and duplicated"; it returns True for anything that isn't column/row partitioned.

At minimum, consider guarding on hasattr(param, "_tensor_parallel_mode"):

def is_mcore_tensor_parallel_duplicated(param: torch.Tensor) -> bool: """ Check if the given parameter is Megatron-Core tensor model parallel and duplicated. """ return hasattr(param, "_tensor_parallel_mode") and get_mcore_tensor_parallel_partition_dim(param) is None

claude · 2026-03-13T05:53:58Z

tests/unit_tests/distributed/megatron_fsdp/test_mfsdp_uneven_dtensor.py

+    else:
+        device_type = "cpu"
+        device = torch.device("cpu")
+        backend = "gloo"
+
+    # Initialize process group if not already initialized
+    if not dist.is_initialized():
+        dist.init_process_group(
+            backend=backend,
+            rank=rank,
+            world_size=world_size,
+        )
+
+    yield {
+        "rank": rank,
+        "world_size": world_size,
+        "local_rank": local_rank,
+        "device_type": device_type,
+        "device": device,
+    }
+
+    # Cleanup
+    if dist.is_initialized():
+        dist.destroy_process_group()
+
+
+# ---------- Helper: distributed setup ----------
+
+@pytest.fixture(scope="module")
+def distributed_setup():
+    """Setup torch.distributed and CUDA device for torchrun + pytest."""
+    if "RANK" not in os.environ or "WORLD_SIZE" not in os.environ:
+        pytest.skip("Not running under torchrun. Use torchrun to run this test file.")
+


The distributed_setup fixture is defined twice (first at line 31, again here). The second definition silently shadows the first. Remove the duplicate.

claude · 2026-03-13T05:54:00Z

tests/unit_tests/distributed/megatron_fsdp/test_mcore_tensor_parallelism_detect.py

+            self.layer_norm_bias = nn.Parameter(torch.empty(8))
+
+    module = TELayerNormColumnParallelLinear()
+


Typo in test name: teliner → telinear

Suggested change

def test_detect_parallelism_telinear_parallel_mode_variants():

shjwudp added 2 commits January 29, 2026 18:27

Fix the condition for MCore TP

0e216ab

Use M-Bridge TP parallel mode detector

303d51e

refactor gather_uneven_dtensor_to_full_tensor and add UT for uneven d…

bc35356

…tensor

shjwudp changed the title ~~[M-FSDP] fix mcore tp type detect~~ Draft(M-FSDP): Demonstrate the fix for Megatron-Bridge Feb 2, 2026

shjwudp and others added 8 commits February 2, 2026 10:40

Fix the condition for MCore TP

77172c4

Use M-Bridge TP parallel mode detector

f941f6c

Merge branch 'main' into mfsdp_fix_mcore_tp_detect

fcd18ff

Fix MFSDP compatible with frozen param & add UT

6251286

Fix get_mcore_tensor_parallel_partition_dim

d2e6ec3

Merge branch 'mfsdp_fix_mcore_tp_detect' of github.com:shjwudp/Megatr…

efced48

…on-LM into mfsdp_fix_mcore_tp_detect

Merge branch 'mfsdp_fix_mcore_tp_detect' into mfsdp_fix_mcore_tp

d66cd8a

Merge remote-tracking branch 'origin/mfsdp_fix_frozen_param' into mfs…

a3974c3

…dp_fix_mcore_tp

conver334 mentioned this pull request Feb 10, 2026

Draft: compatible with Megatron-FSDP TP NVIDIA-NeMo/Megatron-Bridge#2299

Open

5 tasks

shjwudp added 2 commits March 10, 2026 17:04

Merge remote-tracking branch 'nvidia/main' into mfsdp_fix_mcore_tp

abcac06

add mcore tp detect unit-test

c21d79d

claude bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft(M-FSDP): Demonstrate the fix for Megatron-Bridge#3161

Draft(M-FSDP): Demonstrate the fix for Megatron-Bridge#3161
shjwudp wants to merge 13 commits intoNVIDIA:mainfrom
shjwudp:mfsdp_fix_mcore_tp

shjwudp commented Jan 30, 2026

Uh oh!

copy-pr-bot bot commented Jan 30, 2026

Uh oh!

shjwudp commented Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	>>> full_tensor = gather_uneven_dtensor_to_full_tensor(dtensor)
	>>> full_tensor = uneven_dtensor_to_full_tensor(dtensor)

		self.layer_norm_bias = nn.Parameter(torch.empty(8))

		module = TELayerNormColumnParallelLinear()


	def test_detect_parallelism_telinear_parallel_mode_variants():

Conversation

shjwudp commented Jan 30, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Jan 30, 2026

Uh oh!

shjwudp commented Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(Step 1): Add PR label `Expert Review`