Skip to content

Confusing Nanny on_exit callback structure #7321

Open
@fjetter

Description

@fjetter

The Nanny is managing a subprocess in which a Worker is started. If that process exits a cascade of on_exit callbacks are triggered.

The order in which things happen is

  • AsyncProcess._on_exit
    This one is not doing a lot. It sets an event s.t. the process is joinable. Then it triggers another on_exit
  • WorkerProcess._on_exit is just calling WorkerProcess.mark_stopped
  • WorkerProcess.mark_stopped is resetting some state in WorkerProcess and is calling another on_exit
  • Nanny._on_worker_exit_sync is scheduling a coroutine on the loop which is the next on_exit
  • Nanny._on_worker_exit is unregistering the worker from the scheduler and if need be restarts the worker process

This chain of events is not only confusing but also subject to race conditions. Particularly that the final, most relevant on_exit callback is scheduled with a loop.call_soon allows for various race conditions.

These race conditions are currently not a direct issue. Most race conditions are actually buffered by various idempotent implementations of close/start but once we touch this structure, this is getting a bit shaky.

I debugged this during the investigation of #7312 but this chain is not directly causing the issue. This issue is mostly to document the situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    asynciohygieneImprove code quality and reduce maintenance overhead

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions