docs(scheduler): update semi-preemptible HLD with minSubGroups and multi-level tree semantics#1697
Open
SiorMeir wants to merge 7 commits into
Open
docs(scheduler): update semi-preemptible HLD with minSubGroups and multi-level tree semantics#1697SiorMeir wants to merge 7 commits into
SiorMeir wants to merge 7 commits into
Conversation
Contributor
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3 tasks
…lti-level tree semantics Addresses gaps identified in issues #1585 and #1596: - Clarify that the core/elastic split applies exclusively at minMember leaf PodSets - Define behavior for multi-level trees with minSubGroups intermediate nodes - Add immutability constraint section (validation webhook requirement) - Document future minNonPreemptible field: scope, work required, new complexity introduced Signed-off-by: SiorMeir <msior@nvidia.com>
…ble mode Signed-off-by: SiorMeir <msior@nvidia.com>
…sticity Replace the pod-only semi-elasticity model with a tree-level one: a SubGroupSet's minSubGroup defines how many child subgroups are core (non-preemptible) while surplus subgroups are elastic, alongside the existing leaf minMember pod split. The non-preemptible resource count is redefined as the tree's minimal satisfying set, matching what the allocator/eviction already compute. Document the interaction with segmented subgroups (subgroup-level elasticity is what gives segmented workloads an elastic tier, and whole-segment eviction preserves the forced topology shape) and record the leaf-only quota-accounting follow-up gap for the impl branch. Signed-off-by: SiorMeir <msior@nvidia.com>
8f1b83c to
2fd554c
Compare
Document that semi-preemptible reuses the existing preemptibility API (a third enum/label value, no new field): show it set on the PodGroup spec, via the kai.scheduler/preemptibility workload label, and on a multi-level tree using minSubGroup for subgroup-level core/elastic. Signed-off-by: SiorMeir <msior@nvidia.com>
Signed-off-by: SiorMeir <msior@nvidia.com>
Signed-off-by: SiorMeir <msior@nvidia.com>
Signed-off-by: SiorMeir <msior@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Updates the semi-preemptible design document to reflect the current API (which now includes
minSubGroups) and resolves open questions from issues #1585 and #1596.Key changes:
minMemberleaf PodSets —minSubGroupsintermediate nodes define scheduling gates only, not preemption thresholdsminMemberandminSubGroupsmust not increase on semi-preemptible PodGroups post-creation (validation webhook requirement)minNonPreemptiblefield: scope, work required, and the three-tier pod ordering complexity it introducesRelated Issues
Fixes #1585
Fixes #1596
Checklist
Breaking Changes
None — documentation only.
Additional Notes
This is an HLD update only. Technical design and implementation are tracked separately.