[Server] Recycle guaranteed executor workers via max_tasks_per_child#9964
[Server] Recycle guaranteed executor workers via max_tasks_per_child#9964aylei wants to merge 1 commit into
Conversation
Guaranteed executor workers (the PoolExecutor pool) live for the lifetime of the API server, so a worker's RSS only ever grows to the high-water mark of the heaviest request it has served and is never reclaimed. Under sustained load the pool's cumulative memory creeps toward the container limit. Add an opt-in `SKYPILOT_API_SERVER_WORKER_MAX_TASKS_PER_CHILD` env var that maps to ProcessPoolExecutor's `max_tasks_per_child` (added in Python 3.11), recycling a worker after it has handled that many requests so its memory is returned to the OS. Unset by default (no behavior change); ignored with a warning on Python < 3.11. Applies to the guaranteed pool only — burst workers are already disposed after each task.
There was a problem hiding this comment.
Code Review
This pull request introduces support for recycling guaranteed worker processes after they have handled a specified number of tasks, bounding their high-water-mark RSS. This is achieved by exposing a new environment variable, SKYPILOT_API_SERVER_WORKER_MAX_TASKS_PER_CHILD, which maps to ProcessPoolExecutor's max_tasks_per_child parameter (available in Python 3.11+). The configuration is propagated through the server configuration to the worker executors, and comprehensive unit tests are added to verify the recycling behavior and environment variable parsing. There are no review comments to assess, and I have no additional feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Summary
Guaranteed executor workers (
PoolExecutor) live for the lifetime of the API server. A worker's RSS therefore only ever grows to the high-water mark of the heaviest request it has handled and is never reclaimed, so under sustained load the pool's cumulative memory creeps toward the container limit.This adds an opt-in
SKYPILOT_API_SERVER_WORKER_MAX_TASKS_PER_CHILDenvironment variable that maps toProcessPoolExecutor'smax_tasks_per_child(added in Python 3.11): a guaranteed worker is recycled after it has handled that many requests, returning its memory to the OS. This is the official replacement for the existingDisposableExecutorworkaround noted inprocess.py(TODO(aylei): use the official max_tasks_per_child when upgrade to 3.11).DisposableExecutor) are already disposed after each task, and the setting is preserved across aBrokenProcessPoolrebuild.Test plan
tests/unit_tests/test_sky/server/requests/test_process.py:test_pool_executor_recycles_after_max_tasks— withmax_tasks_per_child=2and a single worker, submitting 4 tasks yields worker PIDs[A, A, B, B](recycled after every 2 tasks). Skipped on Python < 3.11.test_pool_executor_no_recycle_by_default— without the setting, all tasks run on one PID.test_burstable_executor_max_tasks_per_child_routing— the setting reaches the guaranteed pool kwargs and is not forwarded to burst workers.tests/unit_tests/test_sky/server/test_config.py: env parsing (valid / invalid / unset), the Python < 3.11 gate, and propagation into both worker configs.pytest tests/unit_tests/test_sky/server/requests/test_process.py tests/unit_tests/test_sky/server/requests/test_executor.py tests/unit_tests/test_sky/server/test_config.pyall pass on Python 3.11.