Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support collective matmul optimization in mp #8855

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yaochengji
Copy link
Collaborator

This PR:

  • add the new environment variable ENABLE_COLLECTIVE_MATMUL_IN_MP to turn on the xla config which is required for collective matmul optimization on TPU
  • add channel_id and use_global_device_ids to xm.all_gather and xm.reduce_scatter, which is also required to enable collective matmul
  • add a utility function to find the best ring order for v5p x 8 and v6 x 8

Copy link
Collaborator

@tengyifei tengyifei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do you think it's possible to test that collective matmul is actually triggered? maybe we could look for the "decomposed_reduce_scatter_while" etc signatures in the optimized HLO from the XLA dump?

@yaochengji
Copy link
Collaborator Author

question: do you think it's possible to test that collective matmul is actually triggered? maybe we could look for the "decomposed_reduce_scatter_while" etc signatures in the optimized HLO from the XLA dump?

Thanks for you review, Yifei! I need to wait for tomorrow's libtpu nightly to enable collective matmul.

@tengyifei
Copy link
Collaborator

I need to wait for tomorrow's libtpu nightly to enable collective matmul.

Gotcha. I think it's useful to add a test once we roll in a new libtpu. Otherwise it's hard to tell if this worked. LGTM otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants