DDP refactoring: Extract parameter layout computation into optimizer classmethod by deepakn94 · Pull Request #3812 · NVIDIA/Megatron-LM

deepakn94 · 2026-03-11T22:29:32Z

Move the optimizer-specific parameter layout logic (padding, bucket splitting) out of _ParamAndGradBuffer.init and into DistributedOptimizer.compute_param_layout(). This decouples the buffer from optimizer-specific assumptions, allowing future optimizer implementations to define custom parameter layouts by overriding the classmethod.

Introduces ParamLayout dataclass and _default_param_layout() for the non-distributed case.

…classmethod Move the optimizer-specific parameter layout logic (padding, bucket splitting) out of _ParamAndGradBuffer.__init__ and into DistributedOptimizer.compute_param_layout(). This decouples the buffer from optimizer-specific assumptions, allowing future optimizer implementations to define custom parameter layouts by overriding the classmethod. Introduces ParamLayout dataclass and _default_param_layout() for the non-distributed case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>

copy-pr-bot · 2026-03-11T22:29:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

deepakn94 · 2026-03-11T22:30:49Z

megatron/core/distributed/param_and_grad_buffer.py

            Pads end index of bucket if using distributed optimizer (to ensure uniform sharding).
            """
            if self.ddp_config.use_distributed_optimizer:
-                # Workaround for TE bug causing cuBLAS to pick an incompatible algorithm.


Don't delete this.

deepakn94 · 2026-03-11T22:31:36Z

megatron/core/distributed/param_and_grad_buffer.py

+        self.bucket_indices = layout.bucket_indices
+        per_bucket_numel_unpadded = layout.per_bucket_numel_unpadded

        def _pad(number_to_be_padded: int, divisor: int) -> int:


Is this method used at all?

deepakn94 commented Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP refactoring: Extract parameter layout computation into optimizer classmethod#3812

DDP refactoring: Extract parameter layout computation into optimizer classmethod#3812
deepakn94 wants to merge 1 commit intoNVIDIA:mainfrom
deepakn94:dnarayanan/refactor_param_mapping

deepakn94 commented Mar 11, 2026

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

deepakn94 Mar 11, 2026

Uh oh!

deepakn94 Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deepakn94 commented Mar 11, 2026

Uh oh!

copy-pr-bot bot commented Mar 11, 2026

Uh oh!

deepakn94 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

deepakn94 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant