Skip to content

Share a single DockerStatsStreamer across CPU/Memory intrinsic plugins #11232

@rapsealk

Description

@rapsealk

Parent epic: #11216
Follow-up to: #11224

Main idea

After #11224, CPUPlugin and MemoryPlugin each own their own DockerStatsStreamer, and each constructs its own aiodocker.Docker() client in init(). That means every running container opens two persistent Docker stats streams across two separate aiohttp.ClientSession instances.

Aiohttp's default TCPConnector(limit_per_host=30) means at roughly 50+ containers per agent you start queueing stream setups and stacking backoff retries. Memory and connector overhead are also double what they need to be.

This becomes even more obviously worth refactoring once #11223 (sysfs-first CPU/memory stats) lands, because CPU/Memory api_impl paths will fall through to Docker only as a fallback — the persistent stream is really only needed for network / blkio fields neither plugin consumes today.

Design

  • Own a single DockerStatsStreamer on DockerAgent (or AbstractAgent, if the abstraction generalizes to kubernetes).
  • Both CPUPlugin and MemoryPlugin consume via a shared reference (set during plugin init, or accessed through the existing computer_ctx).
  • Lifecycle (start / stop) still hangs off _handle_start_event / _handle_clean_event; the agent just dispatches to a single streamer instead of each plugin.
  • Post-refactor(BA-5860): Default to sysfs-first CPU/memory stats on native Linux #11223: consider moving the streamer out of the intrinsic plugins entirely and into a network/IO consumer, since that's what actually needs the stream.

Dependencies

Scope

  • Refactor ownership of DockerStatsStreamer.
  • Unify the two streamers into one instance per agent.
  • Update notify_container_started/destroyed dispatch path accordingly.
  • No behavioral change for operators — this is purely a resource-consumption fix.

Out of scope

  • Changing the streaming protocol or cadence.
  • Kubernetes agent (src/ai/backend/agent/kubernetes/) — if it's affected, evaluate in parallel.

JIRA Issue: BA-5861

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Story.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions