Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MpDeviceLoader to shard dictionaries of tensor #8202

Merged
merged 4 commits into from
Oct 2, 2024
Merged

Conversation

bhavya01
Copy link
Collaborator

@bhavya01 bhavya01 commented Oct 1, 2024

Reland #7912

The new code is only used when input_sharding is a dictionary of ShardingSpec. So, it should be backward compatible. Before this PR, it is expected to be a ShardingSpec

@bhavya01 bhavya01 self-assigned this Oct 1, 2024
@bhavya01 bhavya01 added the tpuci label Oct 1, 2024
@bhavya01 bhavya01 requested a review from JackCaoG October 1, 2024 20:46
@bhavya01
Copy link
Collaborator Author

bhavya01 commented Oct 1, 2024

@ManfeiBai: I will cherry pick this into 2.5 release.
@zpcore: This should make SD2 script compatible with future nighty versions.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Oct 2, 2024

I will take a look today

docs/spmd_advanced.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to separate the error handling from loading. iPutting an exception into the data_queue makes logic a bit messy IMO. I will approve to unblock, I can do a follow up here.

@bhavya01
Copy link
Collaborator Author

bhavya01 commented Oct 2, 2024

The test failure is not related to this PR. Merging.

@bhavya01 bhavya01 merged commit cc631b9 into master Oct 2, 2024
21 of 22 checks passed
dvhg pushed a commit to dvhg/xla that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants