|
| 1 | +## v0.14.0 |
| 2 | + |
| 3 | +Changes since `v0.13.0`: |
| 4 | + |
| 5 | +## Urgent Upgrade Notes |
| 6 | + |
| 7 | +### (No, really, you MUST read this before you upgrade) |
| 8 | + |
| 9 | +- ProvisioningRequest: Remove setting deprecated ProvisioningRequest annotations on Kueue-managed Pods: |
| 10 | + - cluster-autoscaler.kubernetes.io/consume-provisioning-request |
| 11 | + - cluster-autoscaler.kubernetes.io/provisioning-class-name |
| 12 | + |
| 13 | + If you are implementing a ProvisioningRequest reconciler used by Kueue you should |
| 14 | + make sure the new annotations are supported: |
| 15 | + - autoscaling.x-k8s.io/consume-provisioning-request |
| 16 | + - autoscaling.x-k8s.io/provisioning-class-name (#6381, @kannon92) |
| 17 | + - Rename kueue-metrics-certs to kueue-metrics-cert cert-manager.io/v1 Certificate name in cert-manager manifests when installing Kueue using the Kustomize configuration. |
| 18 | + |
| 19 | + If you're using cert-manager and have deployed Kueue using the Kustomize configuration, you must delete the existing kueue-metrics-certs cert-manager.io/v1 Certificate before applying the new changes to avoid conflicts. (#6345, @mbobrovskyi) |
| 20 | + - Replace "DeactivatedXYZ" "reason" label values with "Deactivated" and introduce "underlying_cause" label to the following metrics: |
| 21 | + - "pods_ready_to_evicted_time_seconds" |
| 22 | + - "evicted_workloads_total" |
| 23 | + - "local_queue_evicted_workloads_total" |
| 24 | + - "evicted_workloads_once_total" |
| 25 | + |
| 26 | + If you rely on the "DeactivatedXYZ" "reason" label values, you can migrate to the "Deactivated" "reason" label value and the following "underlying_cause" label values: |
| 27 | + - "" |
| 28 | + - "WaitForStart" |
| 29 | + - "WaitForRecovery" |
| 30 | + - "AdmissionCheck" |
| 31 | + - "MaximumExecutionTimeExceeded" |
| 32 | + - "RequeuingLimitExceeded" (#6590, @mykysha) |
| 33 | + - TAS: Enforce a stricter value of the `kueue.x-k8s.io/podset-group-name` annotation in the creation webhook. |
| 34 | + |
| 35 | + Make sure the values of the `kueue.x-k8s.io/podset-group-name` annotation are not numbers.` (#6708, @kshalot) |
| 36 | + |
| 37 | +## Upgrading steps |
| 38 | + |
| 39 | +### 1. Back Up Topology Resources (skip if you are not using Topology API): |
| 40 | + |
| 41 | +kubectl get topologies.kueue.x-k8s.io -o yaml > topologies.yaml |
| 42 | + |
| 43 | +### 2. Update apiVersion in Backup File (skip if not using Topology API): |
| 44 | +Replace `v1alpha1` with `v1beta1` in topologies.yaml for all resources: |
| 45 | + |
| 46 | +sed -i -e 's/v1alpha1/v1beta1/g' topologies.yaml |
| 47 | + |
| 48 | +### 3. Delete Old CRDs: |
| 49 | + |
| 50 | +kubectl delete crd topologies.kueue.x-k8s.io |
| 51 | + |
| 52 | +### 4. Remove Finalizers from Topologies (skip if you are not using Topology API): |
| 53 | + |
| 54 | +kubectl get topology.kueue.x-k8s.io -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | while read -r name; do |
| 55 | + kubectl patch topology.kueue.x-k8s.io "$name" -p '{"metadata":{"finalizers":[]}}' --type='merge' |
| 56 | +done |
| 57 | + |
| 58 | +### 5. Install Kueue v0.14.0: |
| 59 | +Follow the instructions [here](https://kueue.sigs.k8s.io/docs/installation/#install-a-released-version) to install. |
| 60 | + |
| 61 | +### 6. Restore Topology Resources (skip if not using Topology API): |
| 62 | + |
| 63 | +kubectl apply -f topologies.yaml |
| 64 | + |
| 65 | +## Changes by Kind |
| 66 | + |
| 67 | +### Deprecation |
| 68 | + |
| 69 | +- Stop serving the QueueVisibility feature, but keep APIs (`.status.pendingWorkloadsStatus`) to avoid breaking changes. |
| 70 | + |
| 71 | + If you rely on the QueueVisibility feature (`.status.pendingWorkloadsStatus` in the ClusterQueue), you must migrate to VisibilityOndDemand |
| 72 | + (https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand). (#6631, @vladikkuzn) |
| 73 | + |
| 74 | +### API Change |
| 75 | + |
| 76 | +- TAS: Graduated TopologyAwareScheduling to Beta. (#6830, @mbobrovskyi) |
| 77 | +- TAS: Support multiple nodes for failure handling by ".status.unhealthyNodes" in Workload. The "alpha.kueue.x-k8s.io/node-to-replace" annotation is no longer used (#6648, @pajakd) |
| 78 | + |
| 79 | +### Feature |
| 80 | + |
| 81 | +- Add an alpha integration for Kubeflow Trainer to Kueue. (#6597, @kaisoz) |
| 82 | +- Add an exponential backoff for the TAS scheduler second pass. (#6753, @mykysha) |
| 83 | +- Added priority_class label for kueue_local_queue_admitted_workloads_total metric. (#6845, @vladikkuzn) |
| 84 | +- Added priority_class label for kueue_local_queue_evicted_workloads_total metric (#6898, @vladikkuzn) |
| 85 | +- Added priority_class label for kueue_local_queue_quota_reserved_workloads_total metric. (#6897, @vladikkuzn) |
| 86 | +- Added priority_class label for the following metrics: |
| 87 | + - kueue_admitted_workloads_total |
| 88 | + - kueue_evicted_workloads_total |
| 89 | + - kueue_evicted_workloads_once_total |
| 90 | + - kueue_quota_reserved_workloads_total |
| 91 | + - kueue_admission_wait_time_seconds |
| 92 | + - kueue_quota_reserved_wait_time_seconds |
| 93 | + - kueue_admission_checks_wait_time_seconds (#6951, @mbobrovskyi) |
| 94 | +- Added priority_class to kueue_local_queue_admission_checks_wait_time_seconds (#6902, @vladikkuzn) |
| 95 | +- Added priority_class to kueue_local_queue_admission_wait_time_seconds (#6899, @vladikkuzn) |
| 96 | +- Added priority_class to kueue_local_queue_quota_reserved_wait_time_seconds (#6900, @vladikkuzn) |
| 97 | +- Added workload_priority_class label for optional metrics (if waitForPodsReady is enabled): |
| 98 | + |
| 99 | + - kueue_ready_wait_time_seconds (Histogram) |
| 100 | + - kueue_admitted_until_ready_wait_time_seconds (Histogram) |
| 101 | + - kueue_local_queue_ready_wait_time_seconds (Histogram) |
| 102 | + - kueue_local_queue_admitted_until_ready_wait_time_seconds (Histogram) (#6944, @IrvingMg) |
| 103 | +- DRA: Alpha support for Dynamic Resource Allocation in Kueue. (#5873, @alaypatel07) |
| 104 | +- ElasticJobs: Support in-tree RayAutoscaler for RayCluster (#6662, @VassilisVassiliadis) |
| 105 | +- KueueViz: Enhancing the following endpoint customizations and optimizations: |
| 106 | + - The frontend and backend ingress no longer have hardcoded NGINX annotations. You can now set your own annotations in Helm’s values.yaml using kueueViz.backend.ingress.annotations and kueueViz.frontend.ingress.annotations |
| 107 | + - The Ingress resources for KueueViz frontend and backend no longer require hardcoded TLS. You can now choose to use HTTP only by not providing kueueViz.backend.ingress.tlsSecretName and kueueViz.frontend.ingress.tlsSecretName |
| 108 | + - You can set environment variables like KUEUEVIZ_ALLOWED_ORIGINS directly from values.yaml using kueueViz.backend.env (#6682, @Smuger) |
| 109 | +- MultiKueue: Support external frameworks. |
| 110 | + Introduced a generic MultiKueue adapter to support external, custom Job-like workloads. This allows users to integrate custom Job-like CRDs (e.g., Tekton PipelineRuns) with MultiKueue for resource management across multiple clusters. This feature is guarded by the `MultiKueueGenericJobAdapter` feature gate. (#6760, @khrm) |
| 111 | +- Multikueue × ElasticJobs: The elastic `batchv1/Job` supports MultiKueue. (#6445, @ichekrygin) |
| 112 | +- ProvisioningRequest: Graduate ProvisioningACC feature to GA (#6382, @kannon92) |
| 113 | +- TAS: Graduated to Beta the following feature gates responsible for enabling and default configuration of the Node Hot Swap mechanism: |
| 114 | + TASFailedNodeReplacement, TASFailedNodeReplacementFailFast, TASReplaceNodeOnPodTermination. (#6890, @mbobrovskyi) |
| 115 | +- TAS: Implicit mode schedules consecutive indexes as close as possible (rank-ordering). (#6615, @PBundyra) |
| 116 | +- TAS: introduce validation against using PodSet grouping and PodSet slicing for the same PodSet, |
| 117 | + which is currently not supported. More precisely the `kueue.x-k8s.io/podset-group-name` annotation |
| 118 | + cannot be set along with any of: `kueue.x-k8s.io/podset-slice-size`, `kueue.x-k8s.io/podset-slice-required-topology`. (#7051, @kshalot) |
| 119 | +- The following limits for ClusterQueue quota specification have been relaxed: |
| 120 | + - the number of Flavors per ResourceGroup is increased from 16 to 64 |
| 121 | + - the number of Resources per Flavor, within a ResourceGroup, is increased from 16 to 64 |
| 122 | + |
| 123 | + We also provide the following additional limits: |
| 124 | + - the total number of Flavors across all ResourceGroups is <= 256 |
| 125 | + - the total number of Resources across all ResourceGroups is <= 256 |
| 126 | + - the total number of (Flavor, Resource) pairs within a ResourceGroup is <= 512 (#6906, @LarsSven) |
| 127 | +- Visibility API: Adds support for Securing APIService. (#6798, @MaysaMacedo) |
| 128 | +- WorkloadRequestUseMergePatch: allows switching the Status Patch type from Apply to Merge for admission-related patches. (#6765, @mszadkow) |
| 129 | + |
| 130 | +### Bug or Regression |
| 131 | + |
| 132 | +- AFS: Fixed kueue-controller-manager crash when enabled AdmissionFairSharing feature gate without AdmissionFairSharing config. (#6670, @mbobrovskyi) |
| 133 | +- ElasticJobs: Fix the bug for the ElasticJobsViaWorkloadSlices feature where in case of Job resize followed by eviction |
| 134 | + of the "old" workload, the newly created workload could get admitted along with the "old" workload. |
| 135 | + The two workloads would overcommit the quota. (#6221, @ichekrygin) |
| 136 | +- ElasticJobs: Fix the bug that scheduling of the Pending workloads was not triggered on scale-down of the running |
| 137 | + elastic Job which could result in admitting one or more of the queued workloads. (#6395, @ichekrygin) |
| 138 | +- ElasticJobs: workloads correctly trigger workload preemption in response to a scale-up event. (#6973, @ichekrygin) |
| 139 | +- FS: Fix the algorithm bug for identifying preemption candidates, as it could return a different |
| 140 | + set of preemption target workloads (pseudo random) in consecutive attempts in tie-break scenarios, |
| 141 | + resulting in excessive preemptions. (#6764, @PBundyra) |
| 142 | +- FS: Fix the following FairSharing bugs: |
| 143 | + - Incorrect DominantResourceShare caused by rounding (large quotas or high FairSharing weight) |
| 144 | + - Preemption loop caused by zero FairSharing weight (#6925, @gabesaba) |
| 145 | +- FS: Fixing a bug where a preemptor ClusterQueue was unable to reclaim its nominal quota when the preemptee ClusterQueue can borrow a large number of resources from the parent ClusterQueue / Cohort (#6617, @pajakd) |
| 146 | +- FS: Validate FairSharing.Weight against small values which lose precision (0 < value <= 10^-9) (#6986, @gabesaba) |
| 147 | +- Fix accounting for the `evicted_workloads_once_total` metric: |
| 148 | + - the metric wasn't incremented for workloads evicted due to stopped LocalQueue (LocalQueueStopped reason) |
| 149 | + - the reason used for the metric was "Deactivated" for workloads deactivated by users and Kueue, now the reason label can have the following values: Deactivated, DeactivatedDueToAdmissionCheck, DeactivatedDueToMaximumExecutionTimeExceeded, DeactivatedDueToRequeuingLimitExceeded. This approach aligns the metric with `evicted_workloads_total`. |
| 150 | + - the metric was incremented during preemption before the preemption request was issued. Thus, it could be incorrectly over-counted in case of the preemption request failure. |
| 151 | + - the metric was not incremented for workload evicted due to NodeFailures (TAS) |
| 152 | + |
| 153 | + The existing and introduced DeactivatedDueToXYZ reason label values will be replaced by the single "Deactivated" reason label value and underlying_cause in the future release. (#6332, @mimowo) |
| 154 | +- Fix bug in workload usage removal simulation that results in inaccurate flavor assignment (#7077, @gabesaba) |
| 155 | +- Fix support for PodGroup integration used by external controllers, which determine the |
| 156 | + the target LocalQueue and the group size only later. In that case the hash would not be |
| 157 | + computed resulting in downstream issues for ProvisioningRequest. |
| 158 | + |
| 159 | + Now such an external controller can indicate the control over the PodGroup by adding |
| 160 | + the `kueue.x-k8s.io/pod-suspending-parent` annotation, and later patch the Pods by setting |
| 161 | + other metadata, like the kueue.x-k8s.io/queue-name label to initiate scheduling of the PodGroup. (#6286, @pawloch00) |
| 162 | +- Fix the bug for the StatefulSet integration which would occasionally cause a StatefulSet |
| 163 | + to be stuck without workload after renaming the "queue-name" label. (#7028, @IrvingMg) |
| 164 | +- Fix the bug that a workload going repeatedly via the preemption and re-admission cycle would accumulate the |
| 165 | + "Previously" prefix in the condition message, eg: "Previously: Previously: Previously: Preempted to accommodate a workload ...". (#6819, @amy) |
| 166 | +- Fix the bug which could occasionally cause workloads evicted by the built-in AdmissionChecks |
| 167 | + (ProvisioningRequest and MultiKueue) to get stuck in the evicted state which didn't allow re-scheduling. |
| 168 | + This could happen when the AdmissionCheck controller would trigger eviction by setting the |
| 169 | + Admission check state to "Retry". (#6283, @mimowo) |
| 170 | +- Fix the validation messages when attempting to remove the queue-name label from a Deployment or StatefulSet. (#6715, @Panlq) |
| 171 | +- Fixed a bug that prevented adding the kueue- prefix to the secretName field in cert-manager manifests when installing Kueue using the Kustomize configuration. (#6318, @mbobrovskyi) |
| 172 | +- HC: When multiple borrowing flavors are available, prefer the flavor which |
| 173 | + results in borrowing more locally (closer to the ClusterQueue, further from the root Cohort). |
| 174 | + |
| 175 | + This fixes the scenario where a flavor would be selected which required borrowing |
| 176 | + from the root Cohort in one flavor, while in a second flavor, quota was |
| 177 | + available from the nearest parent Cohort. (#7024, @gabesaba) |
| 178 | +- Helm: Fix a bug where the internal cert manager assumed that the helm installation name is 'kueue'. (#6869, @cmtly) |
| 179 | +- Helm: Fixed a bug preventing Kueue from starting after installing via Helm with a release name other than "kueue" (#6799, @mbobrovskyi) |
| 180 | +- Helm: Fixed bug where webhook configurations assumed a helm install name as "kueue". (#6918, @cmtly) |
| 181 | +- KueueViz: Fix CORS configuration for development environments (#6603, @yankay) |
| 182 | +- KueueViz: Fix a bug that only localhost is an executable domain. (#7011, @kincl) |
| 183 | +- Pod-integration now correctly handles pods stuck in the Terminating state within pod groups, preventing them from being counted as active and avoiding blocked quota release. (#6872, @ichekrygin) |
| 184 | +- ProvisioningRequest: Fix a bug that Kueue didn't recreate the next ProvisioningRequest instance after the |
| 185 | + second (and consecutive) failed attempt. (#6322, @PBundyra) |
| 186 | +- Support disabling client-side ratelimiting in Config API clientConnection.qps with a negative value (e.g., -1) (#6300, @tenzen-y) |
| 187 | +- TAS: Fix a bug that the node failure controller tries to re-schedule Pods on the failure node even after the Node is recovered and reappears (#6325, @pajakd) |
| 188 | +- TAS: Fix a bug where new Workloads starve, caused by inadmissible workloads frequently requeueing due to unrelated Node LastHeartbeatTime update events. (#6570, @utam0k) |
| 189 | +- TAS: Fix the scenario when Node Hot Swap cannot find a replacement. In particular, if slices are used |
| 190 | + they could result in generating invalid assignment, resulting in panic from TopologyUngater. |
| 191 | + Now, such a workload is evicted. (#6914, @PBundyra) |
| 192 | +- TAS: Node Hot Swap allows replacing a node for workloads using PodSet slices, |
| 193 | + ie. when the `kueue.x-k8s.io/podset-slice-size` annotation is used. (#6942, @pajakd) |
| 194 | +- TAS: fix the bug that Kueue is crashing when PodSet has size 0, eg. no workers in LeaderWorkerSet instance. (#6501, @mimowo) |
| 195 | + |
| 196 | +### Other (Cleanup or Flake) |
| 197 | + |
| 198 | +- Promote ConfigurableResourceTransformations feature gate to stable. (#6599, @mbobrovskyi) |
| 199 | +- Support for Kubernetes 1.34 (#6689, @mbobrovskyi) |
| 200 | +- TAS: stop setting the "kueue.x-k8s.io/tas" label on Pods. |
| 201 | + |
| 202 | + In case the implicit TAS mode is used, then the `kueue.x-k8s.io/podset-unconstrained-topology=true` annotation |
| 203 | + is set on Pods. (#6895, @mimowo) |
0 commit comments