Skip to content

Add async collectives RFC.#2897

Open
mwhittaker wants to merge 3 commits intoopenxla:mainfrom
mwhittaker:async_collectives_rfc
Open

Add async collectives RFC.#2897
mwhittaker wants to merge 3 commits intoopenxla:mainfrom
mwhittaker:async_collectives_rfc

Conversation

@mwhittaker
Copy link
Copy Markdown
Member

No description provided.

@mwhittaker mwhittaker requested a review from GleasonK February 9, 2026 20:49
@mwhittaker
Copy link
Copy Markdown
Member Author

@mattjj @hawkinsp

Comment thread rfcs/20260209-async-collectives.md Outdated
Copy link
Copy Markdown
Member

@felixwqp felixwqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I assume implicitly this will only work for shard_map-based sharding? like how ragged_all_to_all is used?

@mwhittaker
Copy link
Copy Markdown
Member Author

Can I assume implicitly this will only work for shard_map-based sharding? like how ragged_all_to_all is used?

I'm not sure. I haven't thought that far ahead. When I support these async collectives in JAX, though, I do plan on only supporting shard_map at first.

Comment thread rfcs/20260209-async-collectives.md Outdated
Comment thread rfcs/20260209-async-collectives.md
Comment thread rfcs/20260209-async-collectives.md
Comment thread rfcs/20260209-async-collectives.md Outdated
Comment thread rfcs/20260209-async-collectives.md
Comment thread rfcs/20260209-async-collectives.md Outdated
@mwhittaker mwhittaker force-pushed the async_collectives_rfc branch from 89da850 to 3926874 Compare March 13, 2026 18:08
@mwhittaker mwhittaker self-assigned this Mar 13, 2026
@fhoushmand
Copy link
Copy Markdown
Member

fhoushmand commented Mar 17, 2026

Would this RFC extend to async dynamic-slice/dynamic-update-slice?
We want to add those to jax as well. If this RFC covers it, great. If not we can talk about extending the support.

@GleasonK
Copy link
Copy Markdown
Member

Would this RFC extend to async dynamic-slice/dynamic-update-slice?

This should naturally extend to these ops. I'm OK to bring them in scope under the umbrella of "known ops that we want to have an async decomposition by a backend"

@GleasonK GleasonK requested a review from fhoushmand March 20, 2026 20:10
Comment thread rfcs/20260209-async-collectives.md Outdated
Copy link
Copy Markdown

@mattjj mattjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread rfcs/20260209-async-collectives.md Outdated
Comment thread rfcs/20260209-async-collectives.md
Comment thread rfcs/20260209-async-collectives.md
Comment thread rfcs/20260209-async-collectives.md
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 23, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 23, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
@mwhittaker
Copy link
Copy Markdown
Member Author

I updated the RFC to include slice ops. I also made things less variadic for now. We can change that later if needed.

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 25, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 25, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 25, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 25, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 874740969
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 890017379
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 890017379
copybara-service bot pushed a commit to openxla/shardy that referenced this pull request Mar 26, 2026
See openxla/stablehlo#2897 for context.

This CL introduces a `stablehlo.future` type and `stablehlo.async_start` and
`stablehlo.async_done`. It does not add any translation for them. That will
come in a later change. I also did not make `async_start` and `async_done`
variadic for now, even though some collectives are variadic. We can add support
for that later if we need it.

PiperOrigin-RevId: 890017379
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants