Skip to content

Releases: kai-scheduler/KAI-Scheduler

v0.14.2

07 May 11:53
e94f9ef

Choose a tag to compare

What's Changed

Added

Changed

  • Suppressed noisy Reconciler error logs and PodGrouperWarning events on transient PodGroup update conflicts. The podgrouper now treats IsConflict errors as expected and silently requeues the reconcile instead of surfacing the apiserver's "object has been modified" message.

Fixed

  • Fixed kai-operator not reconciling on Prometheus and ServiceMonitor changes. The Config controller now watches owned Prometheus and ServiceMonitor resources, so deletions and drift trigger reconciliation. CRD presence is checked at startup against the API server (the scheme-only check used previously could not detect missing CRDs), and the watch is registered only when the CRDs are installed. #877

Full Changelog: v0.14.1...v0.14.2

v0.12.20

07 May 11:55
d9fc2a8

Choose a tag to compare

What's Changed

Added

Changed

  • Suppressed noisy Reconciler error logs and PodGrouperWarning events on transient PodGroup update conflicts. The podgrouper now treats IsConflict errors as expected and silently requeues the reconcile instead of surfacing the apiserver's "object has been modified" message.

Fixed

  • Fixed kai-operator not reconciling on Prometheus and ServiceMonitor changes. The Config controller now watches owned Prometheus and ServiceMonitor resources, so deletions and drift trigger reconciliation. CRD presence is checked at startup against the API server (the scheme-only check used previously could not detect missing CRDs), and the watch is registered only when the CRDs are installed. #877

Full Changelog: v0.12.19...v0.12.20

v0.6.20

07 May 14:15
ba1948c

Choose a tag to compare

What's Changed

Changed

  • Suppressed noisy Reconciler error logs and PodGrouperWarning events on transient PodGroup update conflicts. The podgrouper now treats IsConflict errors as expected and silently requeues the reconcile instead of surfacing the apiserver's "object has been modified" message.

Full Changelog: v0.6.19...v0.6.20

v0.4.20

05 May 14:00
b87c798

Choose a tag to compare

What's Changed

  • fix(scheduler): bind plugin server to localhost by @gshaibi in #998
  • ci: add approval gatekeeper workflow for external contributor PRs (#973) by @gshaibi in #1007
  • chore: auto-resolve CHANGELOG.md merge conflicts with union strategy by @KaiPilotBot in #1056
  • chore(deps): bump github.com/NVIDIA/go-nvml from 0.12.4-1 to 0.13.0-1 by @dependabot[bot] in #1065
  • chore(deps): bump knative.dev/serving from 0.44.0 to 0.48.1 by @dependabot[bot] in #1071
  • chore(deps): bump github.com/gin-contrib/pprof from 1.5.2 to 1.5.3 by @dependabot[bot] in #1134
  • chore(deps): bump github.com/grafana/pyroscope-go from 1.2.1 to 1.2.7 by @dependabot[bot] in #1133
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.39.1 by @dependabot[bot] in #1137
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 by @dependabot[bot] in #1201
  • chore(deps): bump google.golang.org/grpc from 1.77.0 to 1.79.2 by @dependabot[bot] in #1194
  • fix(scheduler): Do not include resources with a count of 0. by @KaiPilotBot in #1141
  • Add dco github action by @KaiPilotBot in #1269
  • chore(deps): bump google.golang.org/grpc from 1.79.2 to 1.79.3 by @dependabot[bot] in #1262
  • ci: Skip dco checkout for dependabot PRs by @KaiPilotBot in #1276
  • build: upgrade Go to 1.25.6, golangci-lint to v2.11.3, controller-gen to v0.20.1 - v0.4 by @davidLif in #1284
  • ci: auto-pass DCO check for dependabot on merge_group events by @KaiPilotBot in #1288
  • chore(deps): bump github.com/gin-gonic/gin from 1.10.0 to 1.12.0 by @dependabot[bot] in #1259
  • ci: Skip DCO check in merge queue commits, due to github shenanigans by @KaiPilotBot in #1296
  • ci: Do not skip the DCO action with github action level ifs, but adjust with bash if and an exclude pattern by @KaiPilotBot in #1304
  • fix: v0.4- pod invariant predicate implement by @enoodle in #1542

Full Changelog: v0.4.19...v0.4.20

v0.9.17

07 May 11:56
a3ce96e

Choose a tag to compare

What's Changed

Changed

  • Suppressed noisy Reconciler error logs and PodGrouperWarning events on transient PodGroup update conflicts. The podgrouper now treats IsConflict errors as expected and silently requeues the reconcile instead of surfacing the apiserver's "object has been modified" message.

Full Changelog: v0.9.16...v0.9.17

v0.14.1

29 Apr 09:08
baf4847

Choose a tag to compare

What's Changed

Added

Fixed

Full Changelog: v0.14.0...v0.14.1

v0.12.19

29 Apr 09:06
64175d7

Choose a tag to compare

What's Changed

Fixed

  • Do not include resources with a count of 0. by @KaiPilotBot in #1142
  • Add resourceclaims/binding RBAC for DRA granular status authorization by @KaiPilotBot in #1377
  • Fixed account for device count in multi-device GPU memory quota check by @enoodle in #1391
  • Check active BindRequests before deleting reservation pods by @enoodle in #1387
  • Do not assume dra claims for completed/failure pods by @davidLif in #1457
  • imagePullSecrets fixes by @enoodle in #1468
  • fix: propagate priorityClass, preemptibility by @SiorMeir in #1479

Full Changelog: v0.12.18...v0.12.19

v0.9.16

29 Apr 08:46
3bb273f

Choose a tag to compare

What's Changed

Added

Fixed

  • Fixed Do not include resources with a count of 0. by @KaiPilotBot in #1139
  • Fixed flaky subgroups e2e (v0.9) by @enoodle in #1474
  • Fixed imagePullSecrets fixes by @enoodle in #1467
  • Fixed account for device count in multi-device GPU memory quota check by @enoodle in #1392
  • Fixed check active BindRequests before deleting reservation pods by @enoodle in #1388

Full Changelog: v0.9.15...v0.9.16

v0.6.19

29 Apr 08:43
b3db30a

Choose a tag to compare

What's Changed

Fixed

  • Fixed account for device count in multi-device GPU memory quota check by @enoodle in #1393
  • Fixed check active BindRequests before deleting reservation pods by @enoodle in #1389

Full Changelog: v0.6.18...v0.6.19

v0.14.0

30 Mar 14:36
d6ea335

Choose a tag to compare

What's Changed

Added

  • Added queue validation webhook to queuecontroller with optional quota validation for parent-child relationships AdheipSingh
  • Added support for VPA configuration for the different components of the KAI Scheduler - jrosenboimnvidia
  • Users that have VPA installed on their cluster can now utilize it for proper vertical autoscaling
  • Added FOSSA scanning for the repository context. Scans will also be performed for submitted PRs. The results can be found here. #1178 - davidLif
  • Added support for Ray subgroup topology-aware scheduling by specifying kai.scheduler/topology, kai.scheduler/topology-required-placement, and kai.scheduler/topology-preferred-placement annotations.
  • Allow subgroups to have a 0 value for "minAvailable". This means that all pods in this subgroup are "elastic extra pods". #1216 davidLif

Changed

  • Auto-enable leader election when operator.replicaCount > 1 to prevent concurrent reconciliation #1218
  • Update go version to v1.26.1, With appropriate upgrades to the base docker images, linter, and controller generator. #1222 - davidLif

Fixed

  • Updated resource enumeration logic to exclude resources with count of 0. #1120
  • Fixed scheduler on k8s < 1.34 with DRA disabled.
  • Fixed pod group controller failing to track DRA GPU resources on Kubernetes 1.32-1.33 clusters. #1214
  • Fixed scheduling-constraints signature hashing for Priority and container HostPort by encoding full int32 values, preventing byte-truncation collisions and flaky signature tests.
  • Fixed rollback in scheduling simulations with DRA #1168 itsomri
  • Fixed a potential state corruption in DRA scheduling simulations #1219 itsomri
  • Fixed operator reconcile loop caused by status-only updates triggering re-reconciliation. #1229 cypres
  • Fixed scheduler not starting on k8s clusters with DRA disabled, due to the ResourceSliceTracker not syncing. #1241 cypres
  • Fixed webhook reconcile loop on AKS, by retaining the cloud-provider-injected namespaceSelector rules during reconciliation. #1292 cypres

New Contributors

Full Changelog: v0.13.4...v0.14.0