Atomic PythonActor spawn + init message #2426

samlurye · 2026-01-29T05:26:46Z

Summary:
Currently, spawning a PythonActor happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's Actor implementation in python. This opens the door for a race condition where:

Actor A spawns a python actor mesh and passes a mesh ref to Actor B.
Actor B calls an endpoint on the mesh ref.
Actor B's message arrives in between actor spawn and the Init message from Actor A.
Actor B's call fails because the python actor thinks it hasn't been properly initialized yet.

This PR solves the problem by processing the Init message as part of PythonActor::init, which is guaranteed to run before any other message is processed. This wasn't possible before #2414 because we didn't have access to the point/rank of the actor until after PythonActor::init.

Differential Revision: D91739758

meta-codesync · 2026-01-29T05:27:20Z

@samlurye has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91739758.

Differential Revision: D91829469

Summary: Pull Request resolved: meta-pytorch#2414 Add an optional `ndslice::Point` as an argument to `RemoteSpawn::new`. When an actor is spawned remotely as part of a proc mesh, the proc mesh agent will plumb the cast point through `RemoteSpawn::gspawn` so that the actor being spawned has access to its cast rank and coordinates when it is created. This will improve the experience for actors (like PythonActor) that need to know their position in an actor mesh before their full capabilities are available; previously, we would need to first spawn the actor, and then separately send a message with its cast point, which can cause race conditions. Differential Revision: D91663308

Summary: Pull Request resolved: meta-pytorch#2426 Currently, spawning a `PythonActor` happens in two stages -- first, we spawn the actors in the actor mesh; then, in a second, separate message, we construct the instance of the user's `Actor` implementation in python. This opens the door for a race condition where: 1. Actor A spawns a python actor mesh and passes a mesh ref to Actor B. 2. Actor B calls an endpoint on the mesh ref. 3. Actor B's message arrives in between actor spawn and the `Init` message from Actor A. 4. Actor B's call fails because the python actor thinks it hasn't been properly initialized yet. This PR solves the problem by processing the `Init` message as part of `PythonActor::init`, which is guaranteed to run before any other message is processed. This wasn't possible before meta-pytorch#2414 because we didn't have access to the point/rank of the actor until after `PythonActor::init`. Differential Revision: D91739758

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026

meta-codesync bot added fb-exported meta-exported labels Jan 29, 2026

samlurye added 3 commits January 30, 2026 13:31

Temporary Commit at 1/29/2026, 1:38:55 PM

fe2423f

Differential Revision: D91829469

samlurye force-pushed the export-D91739758 branch from 6bc86f6 to 76f4d0e Compare January 30, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic PythonActor spawn + init message #2426

Atomic PythonActor spawn + init message #2426

Uh oh!

samlurye commented Jan 29, 2026

Uh oh!

meta-codesync bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Atomic PythonActor spawn + init message #2426

Are you sure you want to change the base?

Atomic PythonActor spawn + init message #2426

Uh oh!

Conversation

samlurye commented Jan 29, 2026

Uh oh!

meta-codesync bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant