Skip to content

docs(scheduler): update semi-preemptible HLD with minSubGroups and multi-level tree semantics#1697

Open
SiorMeir wants to merge 7 commits into
mainfrom
siormeir/semi-preemptible-design
Open

docs(scheduler): update semi-preemptible HLD with minSubGroups and multi-level tree semantics#1697
SiorMeir wants to merge 7 commits into
mainfrom
siormeir/semi-preemptible-design

Conversation

@SiorMeir

Copy link
Copy Markdown
Collaborator

Description

Updates the semi-preemptible design document to reflect the current API (which now includes minSubGroups) and resolves open questions from issues #1585 and #1596.

Key changes:

  • Clarifies that the core/elastic pod split applies exclusively at minMember leaf PodSets — minSubGroups intermediate nodes define scheduling gates only, not preemption thresholds
  • Defines multi-level tree behavior: semi-elasticity is a pod-level concept; subgroups are atomic scheduling units (not semi-elastic)
  • Adds immutability constraint: minMember and minSubGroups must not increase on semi-preemptible PodGroups post-creation (validation webhook requirement)
  • Documents future minNonPreemptible field: scope, work required, and the three-tier pod ordering complexity it introduces

Related Issues

Fixes #1585
Fixes #1596

Checklist

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

None — documentation only.

Additional Notes

This is an HLD update only. Technical design and implementation are tracked separately.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ff9c0026-6a55-49e2-a960-82d655147118

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch siormeir/semi-preemptible-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

SiorMeir added 3 commits June 24, 2026 15:02
…lti-level tree semantics

Addresses gaps identified in issues #1585 and #1596:
- Clarify that the core/elastic split applies exclusively at minMember leaf PodSets
- Define behavior for multi-level trees with minSubGroups intermediate nodes
- Add immutability constraint section (validation webhook requirement)
- Document future minNonPreemptible field: scope, work required, new complexity introduced

Signed-off-by: SiorMeir <msior@nvidia.com>
…ble mode

Signed-off-by: SiorMeir <msior@nvidia.com>
…sticity

Replace the pod-only semi-elasticity model with a tree-level one: a
SubGroupSet's minSubGroup defines how many child subgroups are core
(non-preemptible) while surplus subgroups are elastic, alongside the
existing leaf minMember pod split. The non-preemptible resource count
is redefined as the tree's minimal satisfying set, matching what the
allocator/eviction already compute.

Document the interaction with segmented subgroups (subgroup-level
elasticity is what gives segmented workloads an elastic tier, and
whole-segment eviction preserves the forced topology shape) and record
the leaf-only quota-accounting follow-up gap for the impl branch.

Signed-off-by: SiorMeir <msior@nvidia.com>
@SiorMeir SiorMeir force-pushed the siormeir/semi-preemptible-design branch from 8f1b83c to 2fd554c Compare June 24, 2026 12:03
SiorMeir added 4 commits June 25, 2026 11:08
Document that semi-preemptible reuses the existing preemptibility API
(a third enum/label value, no new field): show it set on the PodGroup
spec, via the kai.scheduler/preemptibility workload label, and on a
multi-level tree using minSubGroup for subgroup-level core/elastic.

Signed-off-by: SiorMeir <msior@nvidia.com>
Signed-off-by: SiorMeir <msior@nvidia.com>
Signed-off-by: SiorMeir <msior@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Semi preemptible] Update design according to the new requirements Add new Semi-Preemptible 3rd preemptibility mode

1 participant