Skip to content

Conversation

@samlurye
Copy link
Contributor

Summary:
Currently, spawning a PythonActor happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's Actor implementation in python. This opens the door for a race condition where:

  1. Actor A spawns a python actor mesh and passes a mesh ref to Actor B.
  2. Actor B calls an endpoint on the mesh ref.
  3. Actor B's message arrives in between actor spawn and the Init message from Actor A.
  4. Actor B's call fails because the python actor thinks it hasn't been properly initialized yet.

This PR solves the problem by processing the Init message as part of PythonActor::init, which is guaranteed to run before any other message is processed. This wasn't possible before #2414 because we didn't have access to the point/rank of the actor until after PythonActor::init.

Differential Revision: D91739758

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 29, 2026

@samlurye has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91739758.

Differential Revision: D91829469
Summary:
Pull Request resolved: meta-pytorch#2414

Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created.

This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions.

Differential Revision: D91663308
Summary:
Pull Request resolved: meta-pytorch#2426

Currently, spawning a `PythonActor` happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's `Actor` implementation in python. This opens the door for a race condition where:
1. Actor A spawns a python actor mesh and passes a mesh ref to Actor B.
2. Actor B calls an endpoint on the mesh ref.
3. Actor B's message arrives in between actor spawn and the `Init` message from Actor A.
4. Actor B's call fails because the python actor thinks it hasn't been properly initialized yet.

This PR solves the problem by processing the `Init` message as part of `PythonActor::init`, which is guaranteed to run before any other message is processed. This wasn't possible before meta-pytorch#2414 because we didn't have access to the point/rank of the actor until after `PythonActor::init`.

Differential Revision: D91739758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant