[Serve] limit num_workers in replica's ThreadPoolExecutor to num_cpus#60271
[Serve] limit num_workers in replica's ThreadPoolExecutor to num_cpus#60271abrarsheikh merged 8 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a useful optimization to limit the number of workers in the replica's ThreadPoolExecutor based on the num_cpus specified in ray_actor_options. This helps prevent CPU oversubscription when using asyncio.to_thread. The implementation is clean and the logic for calculating the number of workers is robust. I've added one suggestion to enhance the test coverage by including more edge cases, which will help ensure the feature is solid.
harshit-anyscale
left a comment
There was a problem hiding this comment.
lgtm
cc: @abrarsheikh for further review
| # __ray_parallel_end__ | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
include your serve app here so that it runs as part of doc test
There was a problem hiding this comment.
Ah, good catch — I hadn’t noticed the doc test flow, Thanks. Added the CustomThreadPool example to main so it runs there now. Please take another look.
|
CI failing, lease merge master into your branch |
Signed-off-by: yaommen <myanstu@163.com>
Signed-off-by: yaommen <myanstu@163.com>
Signed-off-by: yaommen <myanstu@163.com>
00f56de to
ebafa28
Compare
|
update: rebased on latest master |
Signed-off-by: yaommen <myanstu@163.com>
Signed-off-by: yaommen <myanstu@163.com>
|
CI failure(https://buildkite.com/ray-project/premerge/builds/58429/steps/canvas?jid=019be912-9a16-4ce9-8684-eb64454ce290) reason: test_asyncio_default_executor_limited_by_num_cpus had too many parameters and timed out; teardown hit pytest-timeout SIGTERM. To address this, I reduced the parametrized cases from 5 to 3 (0, 2.2, 30) to keep core coverage while lowering runtime. |
Signed-off-by: yaommen <myanstu@163.com>
|
@abrarsheikh , I have
Hi @abrarsheikh ,rebased on master and CI is green. When you have a moment, could you give it another look? |
The test_replica_sync_methods_with_run_sync_in_threadpool test was configured with size="small" (60s timeout), but after PR ray-project#60271 added the test_asyncio_default_executor_limited_by_num_cpus test with a num_cpus=30 parameter, the test suite started timing out during teardown. Change the test size from "small" to "medium" (300s timeout) to give the test suite enough time to complete including cleanup. Fixes flaky test: test_asyncio_default_executor_limited_by_num_cpus[30-32] Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…ray-project#60271) ## Description - Limit the user-code event loop’s default ThreadPoolExecutor size to the deployment’s ray_actor_options["num_cpus"] (fractional values round up, <=0 leaves defaults). - This ensures asyncio.to_thread in Serve replicas respects the CPU reservation and avoids oversubscription. - Added a Serve test that verifies the default executor’s max_workers matches num_cpus. ## Related issues > Link related issues: "Fixes ray-project#59750 ", "Closes ray-project#59750 ", or "Related to ray-project#59750 ". ## Additional information - Tests run: - python -m pytest python/ray/serve/tests/unit/test_user_callable_wrapper.py - python -m pytest python/ray/serve/tests/test_replica_sync_methods.py --------- Signed-off-by: yaommen <myanstu@163.com> Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
…ray-project#60271) ## Description - Limit the user-code event loop’s default ThreadPoolExecutor size to the deployment’s ray_actor_options["num_cpus"] (fractional values round up, <=0 leaves defaults). - This ensures asyncio.to_thread in Serve replicas respects the CPU reservation and avoids oversubscription. - Added a Serve test that verifies the default executor’s max_workers matches num_cpus. ## Related issues > Link related issues: "Fixes ray-project#59750 ", "Closes ray-project#59750 ", or "Related to ray-project#59750 ". ## Additional information - Tests run: - python -m pytest python/ray/serve/tests/unit/test_user_callable_wrapper.py - python -m pytest python/ray/serve/tests/test_replica_sync_methods.py --------- Signed-off-by: yaommen <myanstu@163.com>
…ray-project#60271) ## Description - Limit the user-code event loop’s default ThreadPoolExecutor size to the deployment’s ray_actor_options["num_cpus"] (fractional values round up, <=0 leaves defaults). - This ensures asyncio.to_thread in Serve replicas respects the CPU reservation and avoids oversubscription. - Added a Serve test that verifies the default executor’s max_workers matches num_cpus. ## Related issues > Link related issues: "Fixes ray-project#59750 ", "Closes ray-project#59750 ", or "Related to ray-project#59750 ". ## Additional information - Tests run: - python -m pytest python/ray/serve/tests/unit/test_user_callable_wrapper.py - python -m pytest python/ray/serve/tests/test_replica_sync_methods.py --------- Signed-off-by: yaommen <myanstu@163.com> Signed-off-by: 400Ping <jiekaichang@apache.org>
Description
defaults).
Related issues
Additional information