[core] Optimize ScheduleAndGrantLeases by snapshotting node availabil…#60800
[core] Optimize ScheduleAndGrantLeases by snapshotting node availabil…#60800g199209 wants to merge 4 commits intoray-project:masterfrom
Conversation
…ity per round Signed-off-by: mingfei <mingfei@mds-trading.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance optimization for ScheduleAndGrantLeases by snapshotting node availability at the beginning of each scheduling round. The implementation is well-structured, introducing BeginSchedulingRound and EndSchedulingRound to manage the snapshot's lifecycle and making it reentrant-safe. The changes are well-tested with new unit tests covering the snapshotting logic, reentrancy, and interaction with node draining status. My review includes a couple of suggestions to improve const-correctness and robustness. Overall, this is a great improvement.
Signed-off-by: mingfei <mingfei@mds-trading.com>
Signed-off-by: mingfei <mingfei@mds-trading.com>
|
Nice optimization! The snapshot approach is a clean win given that the Raylet event loop is single threaded. GCS liveness cache can't change mid round, so caching it is just eliminating provably redundant work. Two questions:
|
Signed-off-by: mingfei <mingfei@mds-trading.com>
How I found this problem
You're right, fixed. |
This PR optimizes the performance of
ScheduleAndGrantLeasesin the Raylet.The Problem:
Previously, the scheduling policy performed a full liveness check for every candidate node for every lease being scheduled. In scenarios with a large number of leases (e.g., 2000+) and multiple nodes (e.g., 50+), this resulted in ~100k expensive calls per scheduling round. Each call involved an expensive path:
StringIdMap::Get(mutex + string copy) ->NodeID::FromBinary(hash + allocation) -> GCS cache lookup (mutex). This causedScheduleAndGrantLeasesto more than 60 seconds in reported cases, blocking the event loop.The Fix:
Introduced a per-round node availability snapshot mechanism in
ClusterResourceScheduler.NodeAvailable()calls within the same round use this O(1) snapshot.Key Changes
BeginSchedulingRound()andEndSchedulingRound()toClusterResourceSchedulerto manage the snapshot lifecycle.NodeAvailable()to prioritize the snapshot for remote node liveness.ScheduleAndGrantLeasesloops inClusterLeaseManagerandLocalLeaseManagerwith the snapshot API (reentrant-safe).Test plan
src/ray/raylet/scheduling/tests/cluster_resource_scheduler_test.cc:NodeAvailableSnapshotTest: Confirmsis_node_available_fn_is called only once per node per round.NodeAvailableSnapshotReentrantTest: Confirms nested scheduling rounds (e.g., Cluster calling Local) work correctly.NodeAvailableSnapshotDrainingTest: Confirms draining status is NOT cached and remains live.