Skip to content

Conversation

@mkhona-nvidia
Copy link
Contributor

No description provided.

@mkhona-nvidia mkhona-nvidia self-assigned this Nov 3, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 3, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

if partition_dim is None:
# Fallback path for non TP params.
# Handle 3D conv1d case
if x.dim() == 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to add this logic.
the function still handles 2d input, reshape logic should be outside of this function. probably should be in the Megatron inherited muon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is the same as unfusing QKV: in the optimizer state, there param shape will always be 3D. We would need to add a handle in the OrthogonalizedOptimizer class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants