kubernetes-sigs
diff --git a/‎CHANGELOG/CHANGELOG-0.14.md‎
Lines changed: 203 additions & 0 deletions b/‎CHANGELOG/CHANGELOG-0.14.md‎
Lines changed: 203 additions & 0 deletions
diff --git a/‎Makefile‎
Lines changed: 2 additions & 2 deletions b/‎Makefile‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎charts/kueue/Chart.yaml‎
Lines changed: 2 additions & 2 deletions b/‎charts/kueue/Chart.yaml‎
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,203 @@
+## v0.14.0
+
+Changes since `v0.13.0`:
+
+## Urgent Upgrade Notes 
+
+### (No, really, you MUST read this before you upgrade)
+
+- ProvisioningRequest: Remove setting deprecated ProvisioningRequest annotations on Kueue-managed Pods:
+  - cluster-autoscaler.kubernetes.io/consume-provisioning-request
+  - cluster-autoscaler.kubernetes.io/provisioning-class-name
+  
+  If you are implementing a ProvisioningRequest reconciler used by Kueue you should
+  make sure the new annotations are supported:
+  - autoscaling.x-k8s.io/consume-provisioning-request
+  - autoscaling.x-k8s.io/provisioning-class-name (#6381, @kannon92)
+ - Rename kueue-metrics-certs to kueue-metrics-cert cert-manager.io/v1 Certificate name in cert-manager manifests when installing Kueue using the Kustomize configuration.
+  
+  If you're using cert-manager and have deployed Kueue using the Kustomize configuration, you must delete the existing kueue-metrics-certs cert-manager.io/v1 Certificate before applying the new changes to avoid conflicts. (#6345, @mbobrovskyi)
+ - Replace "DeactivatedXYZ" "reason" label values with "Deactivated" and introduce "underlying_cause" label to the following metrics:
+  - "pods_ready_to_evicted_time_seconds"
+  - "evicted_workloads_total"
+  - "local_queue_evicted_workloads_total"
+  - "evicted_workloads_once_total"
+  
+  If you rely on the "DeactivatedXYZ" "reason" label values, you can migrate to the "Deactivated" "reason" label value and the following "underlying_cause" label values:
+  - ""
+  - "WaitForStart"
+  - "WaitForRecovery"
+  - "AdmissionCheck"
+  - "MaximumExecutionTimeExceeded"
+  - "RequeuingLimitExceeded" (#6590, @mykysha)
+ - TAS: Enforce a stricter value of the `kueue.x-k8s.io/podset-group-name` annotation in the creation webhook.
+  
+  Make sure the values of the `kueue.x-k8s.io/podset-group-name` annotation are not numbers.` (#6708, @kshalot)
+ 
+## Upgrading steps
+
+### 1. Back Up Topology Resources (skip if you are not using Topology API):
+
+kubectl get topologies.kueue.x-k8s.io -o yaml > topologies.yaml
+
+### 2. Update apiVersion in Backup File (skip if not using Topology API):
+Replace `v1alpha1` with `v1beta1` in topologies.yaml for all resources:
+
+sed -i -e 's/v1alpha1/v1beta1/g' topologies.yaml
+
+### 3. Delete Old CRDs:
+
+kubectl delete crd topologies.kueue.x-k8s.io
+
+### 4. Remove Finalizers from Topologies (skip if you are not using Topology API):
+
+kubectl get topology.kueue.x-k8s.io -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | while read -r name; do
+  kubectl patch topology.kueue.x-k8s.io "$name" -p '{"metadata":{"finalizers":[]}}' --type='merge'
+done
+
+### 5. Install Kueue v0.14.0:
+Follow the instructions [here](https://kueue.sigs.k8s.io/docs/installation/#install-a-released-version) to install.
+
+### 6. Restore Topology Resources (skip if not using Topology API):
+
+kubectl apply -f topologies.yaml
+
+## Changes by Kind
+
+### Deprecation
+
+- Stop serving the QueueVisibility feature, but keep APIs (`.status.pendingWorkloadsStatus`) to avoid breaking changes.
+  
+  If you rely on the QueueVisibility feature (`.status.pendingWorkloadsStatus` in the ClusterQueue), you must migrate to VisibilityOndDemand 
+  (https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand). (#6631, @vladikkuzn)
+
+### API Change
+
+- TAS: Graduated TopologyAwareScheduling to Beta. (#6830, @mbobrovskyi)
+- TAS: Support multiple nodes for failure handling by ".status.unhealthyNodes" in Workload. The "alpha.kueue.x-k8s.io/node-to-replace" annotation is no longer used (#6648, @pajakd)
+
+### Feature
+
+- Add an alpha integration for Kubeflow Trainer to Kueue. (#6597, @kaisoz)
+- Add an exponential backoff for the TAS scheduler second pass. (#6753, @mykysha)
+- Added priority_class label for kueue_local_queue_admitted_workloads_total metric. (#6845, @vladikkuzn)
+- Added priority_class label for kueue_local_queue_evicted_workloads_total metric (#6898, @vladikkuzn)
+- Added priority_class label for kueue_local_queue_quota_reserved_workloads_total metric. (#6897, @vladikkuzn)
+- Added priority_class label for the following metrics:
+  - kueue_admitted_workloads_total
+  - kueue_evicted_workloads_total
+  - kueue_evicted_workloads_once_total
+  - kueue_quota_reserved_workloads_total
+  - kueue_admission_wait_time_seconds
+  - kueue_quota_reserved_wait_time_seconds
+  - kueue_admission_checks_wait_time_seconds (#6951, @mbobrovskyi)
+- Added priority_class to kueue_local_queue_admission_checks_wait_time_seconds (#6902, @vladikkuzn)
+- Added priority_class to kueue_local_queue_admission_wait_time_seconds (#6899, @vladikkuzn)
+- Added priority_class to kueue_local_queue_quota_reserved_wait_time_seconds (#6900, @vladikkuzn)
+- Added workload_priority_class label for optional metrics (if waitForPodsReady is enabled):
+  
+  - kueue_ready_wait_time_seconds (Histogram)
+  - kueue_admitted_until_ready_wait_time_seconds (Histogram)
+  - kueue_local_queue_ready_wait_time_seconds (Histogram)
+  - kueue_local_queue_admitted_until_ready_wait_time_seconds (Histogram) (#6944, @IrvingMg)
+- DRA: Alpha support for Dynamic Resource Allocation in Kueue. (#5873, @alaypatel07)
+- ElasticJobs: Support in-tree RayAutoscaler for RayCluster (#6662, @VassilisVassiliadis)
+- KueueViz: Enhancing the following endpoint customizations and optimizations:
+  - The frontend and backend ingress no longer have hardcoded NGINX annotations. You can now set your own annotations in Helm’s values.yaml using kueueViz.backend.ingress.annotations and kueueViz.frontend.ingress.annotations
+  - The Ingress resources for KueueViz frontend and backend no longer require hardcoded TLS. You can now choose to use HTTP only by not providing kueueViz.backend.ingress.tlsSecretName and kueueViz.frontend.ingress.tlsSecretName
+  - You can set environment variables like KUEUEVIZ_ALLOWED_ORIGINS directly from values.yaml using kueueViz.backend.env (#6682, @Smuger)
+- MultiKueue: Support external frameworks.
+  Introduced a generic MultiKueue adapter to support external, custom Job-like workloads. This allows users to integrate custom Job-like CRDs (e.g., Tekton PipelineRuns) with MultiKueue for resource management across multiple clusters. This feature is guarded by the `MultiKueueGenericJobAdapter` feature gate. (#6760, @khrm)
+- Multikueue × ElasticJobs: The elastic `batchv1/Job` supports MultiKueue. (#6445, @ichekrygin)
+- ProvisioningRequest: Graduate ProvisioningACC feature to GA (#6382, @kannon92)
+- TAS: Graduated to Beta the following feature gates responsible for enabling and default configuration of the Node Hot Swap mechanism: 
+  TASFailedNodeReplacement, TASFailedNodeReplacementFailFast, TASReplaceNodeOnPodTermination. (#6890, @mbobrovskyi)
+- TAS: Implicit mode schedules consecutive indexes as close as possible (rank-ordering). (#6615, @PBundyra)
+- TAS: introduce validation against using PodSet grouping and PodSet slicing for the same PodSet, 
+  which is currently not supported. More precisely the `kueue.x-k8s.io/podset-group-name` annotation
+  cannot be set along with any of: `kueue.x-k8s.io/podset-slice-size`, `kueue.x-k8s.io/podset-slice-required-topology`. (#7051, @kshalot)
+- The following limits for ClusterQueue quota specification have been relaxed:
+  - the number of Flavors per ResourceGroup is increased from 16 to 64
+  - the number of Resources per Flavor, within a ResourceGroup, is increased from 16 to 64
+  
+  We also provide the following additional limits:
+  - the total number of Flavors across all ResourceGroups is <= 256
+  - the total number of Resources across all ResourceGroups is <= 256
+  - the total number of (Flavor, Resource) pairs within a ResourceGroup is <= 512 (#6906, @LarsSven)
+- Visibility API: Adds support for Securing APIService. (#6798, @MaysaMacedo)
+- WorkloadRequestUseMergePatch: allows switching the Status Patch type from Apply to Merge for admission-related patches. (#6765, @mszadkow)
+
+### Bug or Regression
+
+- AFS: Fixed kueue-controller-manager crash when enabled AdmissionFairSharing feature gate without AdmissionFairSharing config. (#6670, @mbobrovskyi)
+- ElasticJobs: Fix the bug for the ElasticJobsViaWorkloadSlices feature where in case of Job resize followed by eviction
+  of the "old" workload, the newly created workload could get admitted along with the "old" workload.
+  The two workloads would overcommit the quota. (#6221, @ichekrygin)
+- ElasticJobs: Fix the bug that scheduling of the Pending workloads was not triggered on scale-down of the running 
+  elastic Job which could result in admitting one or more of the queued workloads. (#6395, @ichekrygin)
+- ElasticJobs: workloads correctly trigger workload preemption in response to a scale-up event. (#6973, @ichekrygin)
+- FS: Fix the algorithm bug for identifying preemption candidates, as it could return a different
+  set of preemption target workloads (pseudo random) in consecutive attempts in tie-break scenarios,
+  resulting in excessive preemptions. (#6764, @PBundyra)
+- FS: Fix the following FairSharing bugs:
+  - Incorrect DominantResourceShare caused by rounding (large quotas or high FairSharing weight)
+  - Preemption loop caused by zero FairSharing weight (#6925, @gabesaba)
+- FS: Fixing a bug where a preemptor ClusterQueue was unable to reclaim its nominal quota when the preemptee ClusterQueue can borrow a large number of resources from the parent ClusterQueue / Cohort (#6617, @pajakd)
+- FS: Validate FairSharing.Weight against small values which lose precision (0 < value <= 10^-9) (#6986, @gabesaba)
+- Fix accounting for the `evicted_workloads_once_total` metric:
+  - the metric wasn't incremented for workloads evicted due to stopped LocalQueue (LocalQueueStopped reason)
+  - the reason used for the metric was "Deactivated" for workloads deactivated by users and Kueue, now the reason label can have the following values: Deactivated, DeactivatedDueToAdmissionCheck, DeactivatedDueToMaximumExecutionTimeExceeded, DeactivatedDueToRequeuingLimitExceeded. This approach aligns the metric with `evicted_workloads_total`.
+  - the metric was incremented during preemption before the preemption request was issued. Thus, it could be incorrectly over-counted in case of the preemption request failure.
+  - the metric was not incremented for workload evicted due to NodeFailures (TAS)
+  
+  The existing and introduced DeactivatedDueToXYZ reason label values will be replaced by the single "Deactivated" reason label value and underlying_cause in the future release. (#6332, @mimowo)
+- Fix bug in workload usage removal simulation that results in inaccurate flavor assignment (#7077, @gabesaba)
+- Fix support for PodGroup integration used by external controllers, which determine the 
+  the target LocalQueue and the group size only later. In that case the hash would not be 
+  computed resulting in downstream issues for ProvisioningRequest.
+  
+  Now such an external controller can indicate the control over the PodGroup by adding
+  the `kueue.x-k8s.io/pod-suspending-parent` annotation, and later patch the Pods by setting
+  other metadata, like the kueue.x-k8s.io/queue-name label to initiate scheduling of the PodGroup. (#6286, @pawloch00)
+- Fix the bug for the StatefulSet integration which would occasionally cause a StatefulSet
+  to be stuck without workload after renaming the "queue-name" label. (#7028, @IrvingMg)
+- Fix the bug that a workload going repeatedly via the preemption and re-admission cycle would accumulate the
+  "Previously" prefix in the condition message, eg: "Previously: Previously: Previously: Preempted to accommodate a workload ...". (#6819, @amy)
+- Fix the bug which could occasionally cause workloads evicted by the built-in AdmissionChecks
+  (ProvisioningRequest and MultiKueue) to get stuck in the evicted state which didn't allow re-scheduling.
+  This could happen when the AdmissionCheck controller would trigger eviction by setting the
+  Admission check state to "Retry". (#6283, @mimowo)
+- Fix the validation messages when attempting to remove the queue-name label from a Deployment or StatefulSet. (#6715, @Panlq)
+- Fixed a bug that prevented adding the kueue- prefix to the secretName field in cert-manager manifests when installing Kueue using the Kustomize configuration. (#6318, @mbobrovskyi)
+- HC: When multiple borrowing flavors are available, prefer the flavor which
+  results in borrowing more locally (closer to the ClusterQueue, further from the root Cohort).
+  
+  This fixes the scenario where a flavor would be selected which required borrowing
+  from the root Cohort in one flavor, while in a second flavor, quota was
+  available from the nearest parent Cohort. (#7024, @gabesaba)
+- Helm: Fix a bug where the internal cert manager assumed that the helm installation name is 'kueue'. (#6869, @cmtly)
+- Helm: Fixed a bug preventing Kueue from starting after installing via Helm with a release name other than "kueue" (#6799, @mbobrovskyi)
+- Helm: Fixed bug where webhook configurations assumed a helm install name as "kueue". (#6918, @cmtly)
+- KueueViz: Fix CORS configuration for development environments (#6603, @yankay)
+- KueueViz: Fix a bug that only localhost is an executable domain. (#7011, @kincl)
+- Pod-integration now correctly handles pods stuck in the Terminating state within pod groups, preventing them from being counted as active and avoiding blocked quota release. (#6872, @ichekrygin)
+- ProvisioningRequest: Fix a bug that Kueue didn't recreate the next ProvisioningRequest instance after the
+  second (and consecutive) failed attempt. (#6322, @PBundyra)
+- Support disabling client-side ratelimiting in Config API clientConnection.qps with a negative value (e.g., -1) (#6300, @tenzen-y)
+- TAS: Fix a bug that the node failure controller tries to re-schedule Pods on the failure node even after the Node is recovered and reappears (#6325, @pajakd)
+- TAS: Fix a bug where new Workloads starve, caused by inadmissible workloads frequently requeueing due to unrelated Node LastHeartbeatTime update events. (#6570, @utam0k)
+- TAS: Fix the scenario when Node Hot Swap cannot find a replacement. In particular, if slices are used
+  they could result in generating invalid assignment, resulting in panic from TopologyUngater.
+  Now, such a workload is evicted. (#6914, @PBundyra)
+- TAS: Node Hot Swap allows replacing a node for workloads using PodSet slices, 
+  ie. when the `kueue.x-k8s.io/podset-slice-size` annotation is used. (#6942, @pajakd)
+- TAS: fix the bug that Kueue is crashing when PodSet has size 0, eg. no workers in LeaderWorkerSet instance. (#6501, @mimowo)
+
+### Other (Cleanup or Flake)
+
+- Promote ConfigurableResourceTransformations feature gate to stable. (#6599, @mbobrovskyi)
+- Support for Kubernetes 1.34 (#6689, @mbobrovskyi)
+- TAS: stop setting the "kueue.x-k8s.io/tas" label on Pods. 
+  
+  In case the implicit TAS mode is used, then the `kueue.x-k8s.io/podset-unconstrained-topology=true` annotation
+  is set on Pods. (#6895, @mimowo)
@@ -87,8 +87,8 @@ LD_FLAGS += -X '$(version_pkg).BuildDate=$(shell date -u +%Y-%m-%dT%H:%M:%SZ)'
 
 # Update these variables when preparing a new release or a release branch.
 # Then run `make prepare-release-branch`
-RELEASE_VERSION=v0.13.4
-RELEASE_BRANCH=main
+RELEASE_VERSION=v0.14.0
+RELEASE_BRANCH=release-0.14
 # Application version for Helm and npm (strips leading 'v' from RELEASE_VERSION)
 APP_VERSION := $(shell echo $(RELEASE_VERSION) | cut -c2-)
 
 
@@ -63,7 +63,7 @@ Read the [overview](https://kueue.sigs.k8s.io/docs/overview/) and watch the Kueu
 To install the latest release of Kueue in your cluster, run the following command:
 
 ```shell
-kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.13.4/manifests.yaml
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.0/manifests.yaml
 ```
 
 The controller runs in the `kueue-system` namespace.
 
@@ -16,9 +16,9 @@ type: application
 # NOTE: Do not modify manually. In Kueue, the version and appVersion are
 # overridden to GIT_TAG when building the artifacts, including the helm charts,
 # via Makefile.
-version: 0.13.4
+version: 0.14.0
 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application. Versions are not expected to
 # follow Semantic Versioning. They should reflect the version the application is using.
 # It is recommended to use it with quotes.
-appVersion: "v0.13.4"
+appVersion: "v0.14.0"