Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC][MoE & EP] model code and various parallelisms #725

Closed
wants to merge 3 commits into from

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Dec 9, 2024

tianyu-l added a commit that referenced this pull request Dec 9, 2024
ghstack-source-id: ebaafd51eeab3a33f45012c5c204d7979a9c67f0
Pull Request resolved: #725
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 9, 2024
tianyu-l added a commit that referenced this pull request Dec 9, 2024
ghstack-source-id: 96435a0a0e70b5a2f6931608b3692ae264c89b58
Pull Request resolved: #725
@tianyu-l tianyu-l mentioned this pull request Dec 9, 2024
6 tasks
@tianyu-l tianyu-l changed the title remove Router Parallel, set use_local_output=True by default [PoC][MoE & EP] model code and various parallelisms Dec 9, 2024
@tianyu-l tianyu-l marked this pull request as draft December 9, 2024 21:45
The expert-choice MoE implementation is mostly from torchtune: pytorch/torchtune#1902

Temporary changes to unblock exploration
- [pytorch] comment out the check at https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/parallel/api.py#L66
- [torchtitan] for dp2ep, turn optimizers `foreach` and `clip_grad_norm_` off, as not all parameters are DTensors on the same meshes (e.g. in dp2ep `moe.router.gate` is a replicate torch.Tensor)
- [torchtitan] for dp2ep, comment out `apply_fsdp` which would leave the non-expert parameters replicate

Todo
- FSDP / CP integration

Haven't worked on
- softmax scoring when Router Parallel is used (currently only sigmoid)
- token-choice MoE
- shared expert overlapping

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Dec 10, 2024
ghstack-source-id: f2ffdba21a4408392b1083ab8328909fb41fb16b
Pull Request resolved: #725
@tianyu-l
Copy link
Contributor Author

closing as this is reorganized in #732

@tianyu-l tianyu-l closed this Dec 12, 2024
@tianyu-l tianyu-l deleted the gh/tianyu-l/22/head branch December 18, 2024 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants