Skip to content

Conversation

@samlurye
Copy link
Contributor

Summary:
Add an optional ndslice::Point as an argument to RemoteSpawn::new. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through RemoteSpawn::gspawn so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 28, 2026

@samlurye has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91663308.

samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 28, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 28, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
@samlurye samlurye force-pushed the export-D91663308 branch 2 times, most recently from 3420f9c to 147a97f Compare January 28, 2026 21:07
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 28, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 28, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 28, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
@samlurye samlurye force-pushed the export-D91663308 branch 2 times, most recently from 0b5b098 to a184b1e Compare January 29, 2026 04:59
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 29, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 29, 2026
Summary:

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 29, 2026
Summary:
Pull Request resolved: meta-pytorch#2414

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 29, 2026
Summary:
Currently, spawning a `PythonActor` happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's `Actor` implementation in python. This opens the door for a race condition where:
1. Actor A spawns a python actor mesh and passes a mesh ref to Actor B.
2. Actor B calls an endpoint on the mesh ref.
3. Actor B's message arrives in between actor spawn and the `Init` message from Actor A.
4. Actor B's call fails because the python actor thinks it hasn't been properly initialized yet.

This PR solves the problem by processing the `Init` message as part of `PythonActor::init`, which is guaranteed to run before any other message is processed. This wasn't possible before meta-pytorch#2414 because we didn't have access to the point/rank of the actor until after `PythonActor::init`.

Differential Revision: D91739758
Differential Revision: D91829469
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 30, 2026
Summary:
Pull Request resolved: meta-pytorch#2414

Add the ability to pass an `Attrs` object to `RemoteSpawn::new`, with "environment"-specific info that can be used during actor spawning. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb its message headers through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
Summary:
Pull Request resolved: meta-pytorch#2414

Add the ability to pass an `Attrs` object to `RemoteSpawn::new`, with "environment"-specific info that can be used during actor spawning. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb its message headers through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 30, 2026
Summary:
Pull Request resolved: meta-pytorch#2414

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
samlurye added a commit to samlurye/monarch-1 that referenced this pull request Jan 30, 2026
Summary:
Pull Request resolved: meta-pytorch#2426

Currently, spawning a `PythonActor` happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's `Actor` implementation in python. This opens the door for a race condition where:
1. Actor A spawns a python actor mesh and passes a mesh ref to Actor B.
2. Actor B calls an endpoint on the mesh ref.
3. Actor B's message arrives in between actor spawn and the `Init` message from Actor A.
4. Actor B's call fails because the python actor thinks it hasn't been properly initialized yet.

This PR solves the problem by processing the `Init` message as part of `PythonActor::init`, which is guaranteed to run before any other message is processed. This wasn't possible before meta-pytorch#2414 because we didn't have access to the point/rank of the actor until after `PythonActor::init`.

Differential Revision: D91739758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant