Skip to content

Reuse the shared aiodocker client in DockerKernel methods #11227

@rapsealk

Description

@rapsealk

Parent epic: #11216
Follow-up to: #11218 / #11226

Main idea

PR #11226 routed every per-op Docker() instantiation through a shared long-lived client except in DockerKernel, because plumbing the shared client through would change DockerKernel.__init__'s contract. The remaining throwaway connections are:

  • src/ai/backend/agent/docker/kernel.py:121get_logs()
  • src/ai/backend/agent/docker/kernel.py:224commit() (bare Docker(), not even closing_async — also worth fixing for correctness)
  • src/ai/backend/agent/docker/kernel.py:327download_file()
  • src/ai/backend/agent/docker/kernel.py:356download_single()
  • src/ai/backend/agent/docker/kernel.py:486 — module-level prepare_krunner_env_impl() (separate case — not a DockerKernel method)

Design

  • Add docker: Docker as a keyword-only argument to DockerKernel.__init__, store on self.
  • Update construction sites in DockerAgent (agent.py) to pass self.docker.
  • Update the four DockerKernel methods above to use self.docker instead of async with closing_async(Docker()) / bare Docker().
  • prepare_krunner_env_impl is a module-level function — evaluate the caller: if the caller has a shared Docker client handy, accept it as a parameter; otherwise leave it as-is and note why.
  • Handle pickling: DockerKernel has __getstate__/__setstate__ for RPC marshalling — the docker field must NOT be pickled. Exclude it explicitly in __getstate__ and re-inject on restore (the agent can re-attach self.docker when materializing a kernel from persisted state).

Alternative ideas

  • Instead of a constructor arg, expose a setter (kernel.attach_docker(self.docker)) called from DockerAgent. Slightly less explicit, but avoids touching __init__'s signature. Lower preference — constructor injection is clearer.
  • Inject via a context var. Rejected — implicit and harder to reason about.

Anything else?

  • monitor_docker_events() (in agent.py) must keep its own closing_async(Docker()) — it reconnects on stream drops. Do NOT touch.
  • PersistentServiceContainer in docker/utils.py is bootstrap-time only; not part of this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions