[Core] Add fallback strategy support to placement groups#59024
[Core] Add fallback strategy support to placement groups#59024ryanaoleary wants to merge 14 commits intoray-project:masterfrom
Conversation
|
cc: @MengjinYan |
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback_strategy for placement groups, allowing for alternative scheduling options if the primary configuration is not feasible. The changes are well-implemented across the Python API, Cython layer, and C++ core, including updates to the GCS scheduler. The addition of comprehensive unit and integration tests ensures the new functionality is robust. The code is well-structured, and I've identified one minor opportunity for code simplification to enhance maintainability.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
not stale |
MengjinYan
left a comment
There was a problem hiding this comment.
Some high level comments. Will take a more detailed look after those are resolved. Thanks!
MengjinYan
left a comment
There was a problem hiding this comment.
Thanks for the patience! Added one high level comment.
src/ray/gcs/gcs_placement_group.h
Outdated
There was a problem hiding this comment.
By the new semantics of the bundles field, I think the field should be empty in the beginning. And probably same with the strategy if we add it to the placement group scheduling options
There was a problem hiding this comment.
Tried adding this and it runs into a lot of issues with the python client and the autoscaler, since they expect bundles to be populated for the pending placement group. To fix it we end up having to modify how bundles get cached and essentially return scheduling_strategy index 0 (the primary strategy) anytime bundles would get called anyway.
The change I tried to implement ended up being pretty expansive, is this something we should try to add in a follow-up PR? Or do you know if I might be missing something and there's an easier way to support the new semantics of the field.
There was a problem hiding this comment.
Understood. I think it is a very involved change because the semantic changes.
The current placement group management logic in all places assumes that there is only one set of requirements for each placement group so the requirements and the state tracking are all in the bundles field.
At the same time, with the fallback semantic, while the user facing API remains the bundles will still be used for specifying the highest priority resource requirement, we changed the assumption in GCS placement group management logic and split the original bundles field in PlacementGroupTableData into:
scheduling_strategywhich only contains the scheduling requirements for the bundles- and
bundleswhich only tracks the bundle id and the corresponding node id for the currently chosen placement group. Onlybundle_idandnode_idneeds to be populated and we might need an additional field to indicate the index of the chosen set of requirements in thescheduling_strategy
Looking into the code, I think on a high level, the semantic changes means:
- The user facing APIs should remain the same in terms of the
bundlesfield and just addfallback_optionsadditional field:- semantics in user facing APIs (
placement_group.py) shouldn't change - semantics in
PlacementGroupSpecification,BundleSpecificationetc. shouldn't change
- semantics in user facing APIs (
- On GCS side, the semantics in
GcsPlacementGroupshould change to the new semantics. This means:
(1) The operations relatedbundlefield (GetBundles, GetUnplacedBundles, etc.) should be changed to the semantics of assuming this is the chosen resource requirements.cached_bundle_specs_should be the chosen
(2) The placement group scheduling logic needs to consider all the resource requirements inscheduling_strategywhen the whole placement group needs to be scheduled.
(3) The logic to sync the pending placement groups from GCS to autoscaler needs to provide the list of resource requirements
In terms of the PRs, looks like the current PR handles (2) and related changes to update the PlacementGroupTableData and PlacementGroupSpec data model. And if the current didn't change the existing behavior of having one set of requirements in bundles, I think we can add the additional changes that covers (1) and (3) in separate PRs.
Let me know whether the above makes sense or I missed some aspects in handling the bundles semantic changes.
cc: @edoakes for visibility and if you have additional comment.
There was a problem hiding this comment.
Makes sense, thanks for outlining it I feel like that made it a lot more clear to implement. I went through and think with af673db I've implemented everything from (2) and part of (1). I changed bundles to remain empty until an active strategy has been selected, but updated GetBundles to return the highest priority strategy when bundles is empty so that it still works with pending PGs. I think a follow-up PR would still be needed so that this field fully follow the new semantics and returns an empty list before a strategy has been chosen (and also updates all downstream consumers to check for
For the autoscaler, I updated GetPendingGangResourceRequests since it reads from the proto to check for scheduling_strategy(0) for PENDING groups, and then in a follow-up PR that addresses the autoscaling side can have it actually consider the full list of fallback options when scheduling.
af673db to
5650ee9
Compare
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
|
cc: @MengjinYan I had a bad rebase that ended up dropping some of my changes to the files, I've now added them back in and re-tested. The semantics of the fields is now being followed per #59024 (comment). |
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
| // Label selector associated with this bundle. | ||
| // Populated from bundle_label_selector if provided. | ||
| map<string, string> label_selector = 4; | ||
| LabelSelector label_selector = 4; |
There was a problem hiding this comment.
Proto files modified without RPC fault-tolerance review
Low Severity
.proto files.
Please review the RPC fault-tolerance & idempotency standards guide here:
https://github.com/ray-project/ray/tree/master/doc/source/ray-core/internals/rpc-fault-tolerance.rst
Additional Locations (1)
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
| // Label selector associated with this bundle. | ||
| // Populated from bundle_label_selector if provided. | ||
| map<string, string> label_selector = 4; | ||
| LabelSelector label_selector = 4; |
There was a problem hiding this comment.
Proto files modified require RPC fault-tolerance review
Low Severity
.proto files.
Please review the RPC fault-tolerance & idempotency standards guide here:
https://github.com/ray-project/ray/tree/master/doc/source/ray-core/internals/rpc-fault-tolerance.rst
Additional Locations (1)
| // Index 1..N: fallback strategies. | ||
| placement_group_table_data_.mutable_scheduling_options()->MergeFrom( | ||
| placement_group_spec.fallback_strategy()); | ||
|
|
There was a problem hiding this comment.
Empty bundles field breaks bundle index validation
High Severity
The GcsPlacementGroup constructor no longer populates the bundles field (it only populates scheduling_options). This means bundles is empty for all newly created placement groups until the scheduler calls UpdateActiveBundles. Since HandleGetPlacementGroup serves from in-memory state, client-side bundle_count returns 0 before scheduling runs. This causes check_placement_group_index to reject any placement_group_bundle_index >= 0 with a ValueError. The race window can be significant when other PGs are scheduling concurrently (IsSchedulingInProgress blocks the queue).


Description
This PR adds a
fallback_strategyargument to placement groups. For Ray placement groups,fallback_strategyis a list of scheduling option dicts, where the arguments can bebundlesand (optionally)bundle_label_selector, to try in order until one can be scheduled. This PR also updates the GCS scheduler to consider these fallbacks when callingScheduleUnplacedBundles. If a fallback strategy is chosen, the bundles in the placement group table data will be updated to the selected value.Edit: We've updated the proto fields where
bundlesnow only shows the actively chosen bundles for scheduling, andscheduling_strategycontains the list of all scheduling options (primary bundles and fallback bundles).Related issues
#51564