Releases: NVIDIA/KAI-Scheduler
Releases · NVIDIA/KAI-Scheduler
v0.10.0-rc5
What's Changed
- test: Remove explicit queue v2 storage in env tests by @itsomri in #607
- chore(ci,docs): add conventional PR title guidelines and validation + pull request template by @gshaibi in #581
- fix: queue version in docs to v2 by @enoodle in #608
- refactor: move PR title validation to a separate workflow file by @gshaibi in #610
- feat: Time aware simulator runner by @itsomri in #609
- feat: Better topology allocation for non homogeneous jobs by @davidLif in #604
- fix: Convert Ephemeral-Storage in MaxNodePoolResources from Bytes to GB by @lakshyaj02 in #566
- fix: Fix ray grouper by @itsomri in #617
- feat(admission): Explicitly apply 'nvidia' runtimeClass to GPU pods by @omeryahud in #602
- fix(chart): Fix templating indentation for service resource configuration by @omeryahud in #620
- refactor: add configurable resource names in operator by @enoodle in #613
- feat(operator): Add default podAntiAffinity and service-level Affinity support by @omeryahud in #619
New Contributors
- @lakshyaj02 made their first contribution in #566
- @omeryahud made their first contribution in #602
Full Changelog: v0.10.0-rc3...v0.10.0-rc5
v0.10.0-rc3
What's Changed
- feat: Configure kValue for time aware fairness by @itsomri in #583
- feat: add stale issue action by @SiorMeir in #592
- fix: topology plugin crush with no requested resources by @enoodle in #595
- refactor: Time aware fairness refactor by @itsomri in #600
- Topology plugin node sorting performance improvement by @gshaibi in #588
- test: Time aware fairness burst simulation by @itsomri in #598
- docs: expand topology scheduling strategy section by @gshaibi in #603
- Added multilevel topology doc by @romanbaron in #596
- Bump github.com/argoproj/argo-workflows/v3 from 3.6.4 to 3.6.12 by @dependabot[bot] in #568
- Topology domain sort based on resource ratio by @davidLif in #601
- feat: add externalURL config by @SiorMeir in #563
- Topology plugin goes into infinite loop on empty tasks list by @romanbaron in #606
Full Changelog: v0.10.0-rc2...v0.10.0-rc3
v0.10.0-rc2
v0.10.0-rc1
What's Changed
- Added PodGroup Validating webhook that served by the PodGroup Controller by @romanbaron in #515
- Refactor topology constraint PodGroupInfo by @omer-dayan in #513
- Bigfix - TopologyConstraintInfo clone by @omer-dayan in #520
- Hierarchical subgroup structure by @romanbaron in #518
- Topology plugin implemented as SubsetNodes by @omer-dayan in #503
- Binder env tests by @itsomri in #517
- Priority-Preemptibility Separation Design by @gshaibi in #521
- Refactor topology IDE warnings by @omer-dayan in #514
- Pod-group-controller env tests by @itsomri in #524
- Topology scheduling - on require single relevant level by @omer-dayan in #527
- Queuecontroller env tests by @itsomri in #530
- add pod affinity tests by @enoodle in #532
- Topology scheduling plugin - Multi domain decision by @omer-dayan in #529
- CI E2E - Deploy image registry by @omer-dayan in #542
- Moved SubGroupInfo into a separate package by @romanbaron in #538
- Topology plugin - Filter out worse case domains by @omer-dayan in #531
- Moved TopologyConstraintInfo to a separate package by @romanbaron in #539
- Set job fit error for topology job misconfiguration by @omer-dayan in #545
- Topology consolidation test by @omer-dayan in #547
- Introducing PodSet struct for subgroups by @romanbaron in #540
- Add default queue creation and configuration by @singh1203 in #499
- Removed SetDefaultMinAvailable from PodGroupInfo by @romanbaron in #541
- feat: add TSDB PVC by @SiorMeir in #511
- support k8s 1.34 DRA by @enoodle in #533
- Priority-Preemptability Separation P0 Implementation by @gshaibi in #526
- Topology Plugin Small Refactor by @gshaibi in #553
- TAS: Normalize usage to cluster capacity in prometheus by @itsomri in #555
- E2E Flakiness fix by @gshaibi in #557
- Prepare infra for time aware env tests by @itsomri in #534
- enable scheduler deployment by operator & SchedulingShards by @enoodle in #551
- add delay in tests to allow cache wrappers to update by @enoodle in #559
- Fix SCC for OCP by @itsomri in #544
- Topology aware subGroupSet by @omer-dayan in #556
- Changed SubSetNodes signature to use SubGroupInfo instead of SubGroupSet by @romanbaron in #561
- configure webhook names by @enoodle in #564
- Topology constraint at any subgroup hierarchy level by @romanbaron in #560
- fix(deployments/kai-scheduler): respect helm values set under
nodescaleadjuster.scalingPodImageby @BradenM in #572 - Fix: Preserve default SchedulingShard on Helm upgrades by @gshaibi in #573
- configurable reservation runtime class by @enoodle in #569
- Simplifying subgroup tests by @romanbaron in #567
- Scheduler logger enhancements by @itsomri in #579
- feat: set up service monitor by @SiorMeir in #552
- Renamed SubGroupOrderFn to PodSetOrderFn by @romanbaron in #578
- Extending hierarhical podgroup structure to support multiple levels o… by @romanbaron in #427
- Added SubGroupSetOrderFn by @romanbaron in #580
- Fixed bug in setSubGroups method by @romanbaron in #585
- fix: update default scaling pod image name by @avi-airis in #582
- fix: impove docs and READMEs by @SiorMeir in #525
- Topology Plugin - Domain Packing + Node Sorting by @gshaibi in #558
- Added topology docs by @romanbaron in #584
New Contributors
- @BradenM made their first contribution in #572
- @avi-airis made their first contribution in #582
Full Changelog: v0.9.3...v0.10.0-rc1
v0.9.6
v0.9.5
v0.6.15
v0.4.16
v0.9.4
v0.9.3
What's Changed
- Update the ray grouper plugin by @davidLif in #507
- feat: add prometheus operand by @SiorMeir in #477
- Time fairshare - support tumbling window config by @davidLif in #490
- Fix podgroup condition update by @itsomri in #485
- NodeSet plugin infra by @omer-dayan in #496
- Add Scheduler Concepts to developer documentation by @gshaibi in #509
- adding scheduler operand by @enoodle in #493
- adding operator integration tests by @enoodle in #510
- Added more docs about workload dedicated namespaces by @romanbaron in #516
New Contributors
Full Changelog: v0.9.2...v0.9.3