You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #11224, CPUPlugin and MemoryPlugin each own their own DockerStatsStreamer, and each constructs its own aiodocker.Docker() client in init(). That means every running container opens two persistent Docker stats streams across two separate aiohttp.ClientSession instances.
Aiohttp's default TCPConnector(limit_per_host=30) means at roughly 50+ containers per agent you start queueing stream setups and stacking backoff retries. Memory and connector overhead are also double what they need to be.
This becomes even more obviously worth refactoring once #11223 (sysfs-first CPU/memory stats) lands, because CPU/Memory api_impl paths will fall through to Docker only as a fallback — the persistent stream is really only needed for network / blkio fields neither plugin consumes today.
Design
Own a single DockerStatsStreamer on DockerAgent (or AbstractAgent, if the abstraction generalizes to kubernetes).
Both CPUPlugin and MemoryPlugin consume via a shared reference (set during plugin init, or accessed through the existing computer_ctx).
Lifecycle (start / stop) still hangs off _handle_start_event / _handle_clean_event; the agent just dispatches to a single streamer instead of each plugin.
Parent epic: #11216
Follow-up to: #11224
Main idea
After #11224,
CPUPluginandMemoryPlugineach own their ownDockerStatsStreamer, and each constructs its ownaiodocker.Docker()client ininit(). That means every running container opens two persistent Docker stats streams across two separateaiohttp.ClientSessioninstances.Aiohttp's default
TCPConnector(limit_per_host=30)means at roughly 50+ containers per agent you start queueing stream setups and stacking backoff retries. Memory and connector overhead are also double what they need to be.This becomes even more obviously worth refactoring once #11223 (sysfs-first CPU/memory stats) lands, because CPU/Memory
api_implpaths will fall through to Docker only as a fallback — the persistent stream is really only needed for network / blkio fields neither plugin consumes today.Design
DockerStatsStreameronDockerAgent(orAbstractAgent, if the abstraction generalizes to kubernetes).CPUPluginandMemoryPluginconsume via a shared reference (set during plugin init, or accessed through the existingcomputer_ctx).start/stop) still hangs off_handle_start_event/_handle_clean_event; the agent just dispatches to a single streamer instead of each plugin.Dependencies
Scope
DockerStatsStreamer.notify_container_started/destroyeddispatch path accordingly.Out of scope
src/ai/backend/agent/kubernetes/) — if it's affected, evaluate in parallel.JIRA Issue: BA-5861