Releases: kubernetes-sigs/kueue
v0.8.2
Changes since v0.8.1:
Feature
Bug or Regression
- Fix a bug that could delay the election of a new leader in the Kueue with multiple replicas env. (#3096, @tenzen-y)
- Fix resource consumption computation for partially admitted workloads. (#3206, @trasc)
- Fix restoring parallelism on eviction for partially admitted batch/Jobs. (#3208, @trasc)
- Fix some scenarios for partial admission which are affected by wrong calculation of resources
used by the incoming workload which is partially admitted and preempting. (#3205, @trasc) - Fix webook validation for batch/Job to allow partial admission of a Job to use all available resources.
It also fixes a scenario of partial re-admission when some of the Pods are already reclaimed. (#3207, @trasc) - Prevent job webhooks from dropping fields for newer API fields when Kueue libraries are behind the latest released CRDs. (#3358, @mbobrovskyi)
- RayJob's implementation of Finished() now inspects at JobDeploymentStatus (#3128, @andrewsykim)
Other (Cleanup or Flake)
- Add a jobframework.BaseWebhook that can be used for custom job integrations (#3355, @mbobrovskyi)
Kueue v0.9.0-rc.1
Changes since v0.8.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
-
Changed the
typeofPendingevents, emitted when a Workload can't be admitted, fromNormaltoWarning.Update tools that process this event if they depend on the event
type. (#3264, @kebe7jun) -
Deprecated SingleInstanceInClusterQueue and FlavorIndependent status conditions.
the Admission check status conditions “FlavorIndependent” and “SingleInstanceInClusterQueue” are no longer supported by default.
If you were using any of these conditions for your external AdmissionCheck you need to enable the AdmissionCheckValidationRules feature gate.
For the future releases you will need to provide validation by an external controller. (#3254, @mszadkow)
- Promote MultiKueue API and feature gate to Beta. The MultiKueue feature gate is now beta and enabled by default.
The MultiKueue specific types are now part of the Kueue's v1beta1 API. v1alpha types are no longer supported. (#3230, @trasc)
- Promoted VisibilityOnDemand to Beta and enabled by default.
The v1alpha1 Visibility API is deprecated and will be removed in the next release. Please use v1beta1 instead. (#3008, @mbobrovskyi)
- Provides more details on the reasons for ClusterQueues being inactive.
If you were watching for the reasonCheckNotFoundOrInactivein the ClusterQueue condition, watchAdmissionCheckNotFoundandAdmissionCheckInactiveinstead. (#3127, @trasc) - The QueueVisibility feature and its corresponding API was deprecated.
The QueueVisibility feature and its corresponding API was deprecated and will be removed in the v1beta2. Please use VisibilityOnDemand (https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/) instead. (#3110, @mbobrovskyi)
Changes by Kind
Feature
-
Add gauge metric admission_cycle_preemption_skips that reports the number of Workloads in a ClusterQueue
that got preemptions candidates, but had to be skipped in the last cycle. (#2919, @alculquicondor) -
Add integration for Deployment, where each Pod is treated as a separate Workload. (#2813, @vladikkuzn)
-
Add integration for StatefulSet where Pods are managed by the pod-group integration. (#3001, @vladikkuzn)
-
Added FlowSchema and PriorityLevelConfiguration for Visibility API. (#3043, @mbobrovskyi)
-
Added a new optional
resource.transformationssection to theConfigurationAPI that enables limited customization
of how the resource requirements of a Workload are computed from the resource requests and limits of a Job. (#3026, @dgrove-oss) -
Added a way to specify dependencies between job integrations. (#2768, @trasc)
-
Best effort support for scenarios when the Job is created at the same time as prebuilt workload or momentarily before the workload. In that case an error is logged to indicate that creating a Job before prebuilt-workload is outside of the intended use. (#3255, @mbobrovskyi)
-
CLI: Added EXEC TIME column on kueuectl list workload command. (#2977, @mbobrovskyi)
-
CLI: Added list pods for a job command. (#2280, @Kavinraja-G)
-
CLI: Use protobuf encoding for core K8s APIs in kueuectl. (#3077, @tosi3k)
-
Calculate AllocatableResourceGeneration more accurately. This fixes a bug where a workload might not have the Flavors it was assigned in a previous scheduling cycle invalidated, when the resources in the Cohort had changed. This bug could occur when other ClusterQueues were deleted from the Cohort. (#2984, @gabesaba)
-
Detect and enable support for job CRDs installed after Kueue starts. (#2574, @ChristianZaccaria)
-
Exposed available ResourceFlavors from the ClusterQueue in the LocalQueue status. (#3143, @mbobrovskyi)
-
Graduated LendingLimit to Beta and enabled by default. (#2909, @macsko)
-
Graduated MultiplePreemptions to Beta and enabled by default. (#2864, @macsko)
-
Helm: Support the topologySpreadConstraints and PodDisruptionBudget (#3295, @woehrl01)
-
Hierarchical Cohorts, introduced with the v1alpha1 Cohorts API, allow users to group resources in an arbitrary tree structure. Additionally, quotas and limits can now be defined directly at the Cohort level. See #79 for more details. (#2693, @gabesaba)
-
Included visibility-api.yaml as a part of main.yaml (#3084, @mbobrovskyi)
-
Introduce APIs for Topology Aware Scheduling. (#3235, @mimowo)
-
Introduce the "kueue.x-k8s.io/pod-group-fast-admission" annotation to Plain Pod integration.
If the PlainPod has the annotation and is part of the Plain PodGroup, the Kueue will admit the Plain Pod regardless of whether all PodGroup Pods are created. (#3189, @vladikkuzn)
-
Introduce the new PodTemplate annotation kueue.x-k8s.io/workload, and label kueue.x-k8s.io/podset.
The annotation and label are alpha-level and gated by the new TopologyAwareScheduling feature gate. (#3228, @mimowo) -
Kjobctl: Support
--profileand--local queueoptions for the listing jobs, which are used to filter out listed jobs. (#2736, @mbobrovskyi) -
Label
kueue.x-k8s.io/managedis now added to PodTemplates created via ProvisioningRequest by Kueue (#2877, @PBundyra) -
MultiKueue: Support for the Kubeflow MPIJob (#2880, @mszadkow)
-
MultiKueue: Support for the Kubeflow PaddleJob (#2744, @mszadkow)
-
MultiKueue: Support for the Kubeflow PyTorchJob (#2735, @mszadkow)
-
MultiKueue: Support for the Kubeflow TFJob (#2626, @mszadkow)
-
MultiKueue: Support for the Kubeflow XGBoostJob (#2746, @mszadkow)
-
ProvisioningRequest: Record the ProvisioningRequest creation errors to event and ProvisioningRequest status. (#3056, @IrvingMg)
-
Support for Kubernetes 1.31 (#2402, @mbobrovskyi)
-
[workload] Add
MaximumExecutionTimeSeconds, a mechanism to automatically deactivate a workload if it's execution takes more time that expected. (#3184, @trasc)
Documentation
- Adds installing kubectl-kueue plugin via Krew guide. (#2666, @mbobrovskyi)
- Documentation on how to use Kueue for Deployments is added (#2698, @vladikkuzn)
Bug or Regression
- CLI: Delete the corresponding Job when deleting a Workload. (#2992, @mbobrovskyi)
- CLI: Support
-and.in the resource flavor name oncreate cq(#2703, @trasc) - Fix a bug that could delay the election of a new leader in the Kueue with multiple replicas env. (#3093, @tenzen-y)
- Fix over-admission after deleting resources from borrowing ClusterQueue. (#2873, @mbobrovskyi)
- Fix resource consumption computation for partially admitted workloads. (#3118, @trasc)
- Fix restoring parallelism on eviction for partially admitted batch/Jobs. (#3153, @trasc)
- Fix some scenarios for partial admission which are affected by wrong calculation of resources
used by the incoming workload which is partially admitted and preempting. (#2826, @trasc) - Fix support for kuberay 1.2.x (#2960, @mbobrovskyi)
- Fix webook validation for batch/Job to allow partial admission of a Job to use all available resources.
It also fixes a scenario of partial re-admission when some of the Pods are already reclaimed. (#3152, @trasc) - Helm: Fix a bug for "unclosed action error". (#2683, @mbobrovskyi)
- Prevent infinite preemption loop when PrioritySortingWithinCohort=false
is used together with borrowWithinCohort. (#2807, @mimowo) - Prevent job webhooks from dropping fields for newer API fields when Kueue libraries are behind the latest released CRDs. (#3132, @alculquicondor)
- RayJob's implementation of Finished() now inspects at JobDeploymentStatus (#3120, @andrewsykim)
- Support for helm charts in the us-central1-docker.pkg.dev/k8s-staging-images/charts repository (#2680, @IrvingMg)
- Update Flavor selection logic to prefer Flavors which allow reclamation of lent nominal quota, over Flavors which require preempting workloads within the ClusterQueue. This matches the behavior in the single Flavor case. (#2811, @gabesaba)
Other (Cleanup or Flake)
- Add a jobframework.BaseWebhook that can be used for custom job integrations (#3102, @alculquicondor)
Kueue v0.8.1
Changes since v0.8.0:
Feature
- Add gauge metric admission_cycle_preemption_skips that reports the number of Workloads in a ClusterQueue
that got preemptions candidates, but had to be skipped in the last cycle. (#2942, @alculquicondor) - Publish images via artifact registry (#2832, @alculquicondor)
Bug or Regression
- CLI: Support
-and.in the resource flavor name oncreate cq(#2706, @trasc) - Detect and enable support for job CRDs installed after Kueue starts. (#2991, @ChristianZaccaria)
- Fix over-admission after deleting resources from borrowing ClusterQueue. (#2879, @mbobrovskyi)
- Fix support for kuberay 1.2.x (#2983, @mbobrovskyi)
- Helm: Fix a bug for "unclosed action error". (#2688, @mbobrovskyi)
- Prevent infinite preemption loop when PrioritySortingWithinCohort=false
is used together with borrowWithinCohort. (#2831, @mimowo) - Support for helm charts in the us-central1-docker.pkg.dev/k8s-staging-images/charts repository (#2834, @IrvingMg)
- Update Flavor selection logic to prefer Flavors which allow reclamation of lent nominal quota, over Flavors which require preempting workloads within the ClusterQueue. This matches the behavior in the single Flavor case. (#2829, @gabesaba)
Kueue v0.8.0
Changes since v0.7.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
-
Use a single rate limiter for all API types clients.
Consider adjusting
clientConnection.qpsandclientConnection.burstif you observe any performance degradation. (#2462, @trasc)
Feature
-
Add a column to workload indicating if it is finished (#2615, @highpon)
-
Add preempted_workloads_total metric that tracks the number of preemptions issued by a ClusterQueue) (#2538, @vladikkuzn)
-
Add the following events for eviction on the workload indicating the reason for eviction:
- "EvictedDueToPodsReadyTimeout"
- "EvictedDueToAdmissionCheck"
- "EvictedDueToClusterQueueStopped"
- "EvictedDueToInactiveWorkload" (renamed from InactiveWorkload)
If you were watching for the typed Normal event with
InactiveWorkloadreason, useEvictedDueToInactiveWorkloadreason one instead. (#2376, @mbobrovskyi) -
AdmissionChecks: A workload with a Rejected AdmissionCheck gets deactivated (#2363, @PBundyra)
-
Allow stoping admission from a specific LocalQueue. (#2173, @mbobrovskyi)
-
Allow usage of the pod integration for pods belonging to jobs that Kueue supports, if the support for the job type is explicitly disabled (#2493, @trasc)
-
CLI: Added Node Labels column on resource flavor list. (#2557, @mbobrovskyi)
-
CLI: Added create resourceflavor command. (#2517, @mbobrovskyi)
-
CLI: Added list resourceflavor command. (#2525, @mbobrovskyi)
-
CLI: Added resourceflavor to pass-through commands. (#2518, @mbobrovskyi)
-
CLI: Added version command. (#2346, @mbobrovskyi)
-
CLI: Support autocompletion (#2314, @mbobrovskyi)
-
CLI: Support paging on kueue CLI list commands. (#2313, @mbobrovskyi)
-
CLI: kubectl-kueue tar.gz archives is part of the release artifacts. (#2513, @mbobrovskyi)
-
Do not start Kueue when the visibility server cannot be started, but is requested. (#2636, @mbobrovskyi)
-
Experimental support for helm charts in the gcr.io/k8s-staging-kueue/charts/kueue repository (#2377, @IrvingMg)
-
Improved logging for scheduling and preemption in levels 4 and 5 (#2504, @alculquicondor)
-
Introduce the MultiplePreemptions flag, which allows more than one
preemption to occur in the same scheduling cycle, even with overlapping
FlavorResources (#2641, @gabesaba) -
More granular Preemption condition reasons: InClusterQueue, InCohortReclamation, InCohortFairSharing, InCohortReclaimWhileBorrowing (#2411, @vladikkuzn)
-
MultiKueue: Allow for defaulting of the spec.managedBy field for Jobs managed by MultiKueue.
The defaulting is enabled by the MultiKueueBatchJobWithManagedBy feature-gate. (#2401, @vladikkuzn) -
MultiKueue: Remove remote objects synchronously when the worker cluster is reachable. (#2347, @trasc)
-
MultiKueue: Use batch/Job
spec.managedByfield (#2331, @trasc) -
Multikueue: Batch reconcile events for remote workloads. (#2380, @trasc)
-
ProvisioningRequest: Support for ProvisioningRequest's condition
BookingExpired(#2445, @PBundyra) -
ProvisioningRequets: Support for ProvisioningRequest's condition
CapacityRevoked. ProvisioningRequests objects persist until the corresponding Job or the Workload is deleted (#2196, @PBundyra)
Documentation
- Added details documentation for kubectl-kueue plugin. (#2613, @mbobrovskyi)
- Improve the documentation for the waitForPodsReady (#2541, @mimowo)
Bug or Regression
- Added raycluster roles to manifests.yaml (#2618, @mbobrovskyi)
- CLI: Fixed no Auth Provider found for name "oidc" error. (#2602, @Kavinraja-G)
- Fix check that prevents preemptions when a workload requests 0 for a resource that is at nominal or over it. (#2520, @alculquicondor)
- Fix for the scenario when a workload doesn't match some resource flavors due to affinity or taints
could cause the workload to be continuously retried. (#2407, @KunWuLuan) - Fix missing fairSharingStatus in ClusterQueue (#2424, @mbobrovskyi)
- Fix missing metric cluster_queue_status (#2474, @mbobrovskyi)
- Fix panic that could occur when a ClusterQueue is deleted while Kueue was updating the ClusterQueue status. (#2461, @mbobrovskyi)
- Fix panic when there is not enough quota to assign flavors to a Workload in the cohort, when FairSharing is enabled. (#2439, @mbobrovskyi)
- Fix performance issue in logging when processing LocalQueues. (#2485, @alexandear)
- Fix race condition on delete workload from queue manager. (#2460, @mbobrovskyi)
- Fix race condition on requeue workload. (#2509, @mbobrovskyi)
- Fix race condition on run garbage collection in multikueuecluster reconciler. (#2479, @mbobrovskyi)
- Fix the validation messages, to report the new value rather than old, for the following immutable labels:
kueue.x-k8s.io/queue-name,kueue.x-k8s.io/prebuilt-workload-name, andkueue.x-k8s.io/priority-class. (#2544, @xuxianzhang) - Fixed issue that prevented restoring the startTime and pod template when evicting a batch/v1 Job, if any API errors happened in the process (#2567, @mbobrovskyi)
- MultiKueue: Do not reject a JobSet if the corresponding cluster queue doesn't exist (#2425, @vladikkuzn)
- MultiKueue: Skip garbage collection for disconnected clients which could occasionally result in panic. (#2369, @trasc)
- Show weightedShare in ClusterQueue status.fairSharing even if the value is zero (#2521, @alculquicondor)
- Skip duplicate Tolerations when an admission check introduces a toleration that the job also set. (#2498, @trasc)
Other (Cleanup or Flake)
- Importer: corrects the field name
observedFirstInin logs. (#2500, @alexandear) - Use Patch instead of Update on jobframework multikueue adapters to prevent the risk of dropping fields. (#2590, @mbobrovskyi)
- Use Patch instead of Update on jobframework to prevent the risk of dropping fields. (#2553, @mbobrovskyi)
Kueue v0.7.1
Changes since v0.7.0:
Feature
- Improved logging for scheduling and preemption in levels 4 and 5 (#2510, @gabesaba, @alculquicondor)
- MultiKueue: Remove remote objects synchronously when the worker cluster is reachable. (#2360, @trasc)
Bug or Regression
- Fix check that prevents preemptions when a workload requests 0 for a resource that is at nominal or over it. (#2524, @mbobrovskyi, @alculquicondor)
- Fix for the scenario when a workload doesn't match some resource flavors due to affinity or taints
could cause the workload to be continuously retried. (#2440, @KunWuLuan) - Fix missing fairSharingStatus in ClusterQueue (#2432, @mbobrovskyi)
- Fix missing metric cluster_queue_status. (#2475, @mbobrovskyi)
- Fix panic that could occur when a ClusterQueue is deleted while Kueue was updating the ClusterQueue status. (#2464, @mbobrovskyi)
- Fix panic when there is not enough quota to assign flavors to a Workload in the cohort, when FairSharing is enabled. (#2449, @mbobrovskyi)
- Fix performance issue in logging when processing LocalQueues. (#2492, @alexandear)
- Fix race condition on delete workload from queue manager. (#2465, @mbobrovskyi)
- MultiKueue: Do not reject a JobSet if the corresponding cluster queue doesn't exist (#2442, @vladikkuzn)
- MultiKueue: Skip garbage collection for disconnected clients which could occasionally result in panic. (#2370, @trasc)
- Show weightedShare in ClusterQueue status.fairSharing even if the value is zero (#2522, @alculquicondor)
- Skip duplicate Tolerations when an admission check introduces a toleration that the job also set. (#2499, @trasc)
Kueue v0.7.0
Changes since v0.6.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
-
Added CRD validation rules to AdmissionCheck.
-
Added CRD validation rules to ClusterQueue.
-
Added CRD validation rules to LocalQueue.
-
Added CRD validation rules to ResourceFlavor.
-
Added CRD validation rules to Workload.
-
Increased the default value in the
.waitForPodsReady.requeuingStrategy.backoffBaseSecondsto 60You can configure
.waitForPodsReady.requeuingStrategy.backoffBaseSecondsas needed. (#2251, @mbobrovskyi) -
Upgrade RayJob API to v1
If you use KubeRay older than v1.0.0, you'll have to upgrade your existing installation
to KubeRay v1.0.0, or any more recent version, that supports KubeRay v1 APIs, for it to
remain compatible with Kueue. (#1802, @astefanutti) -
When using admission checks, and they are not satisfied yet, the reason for the Admission condition with status=False is now
UnsatisfiedChecksIf you were watching for the reason
NoChecksin the Admitted condition, useUnsatisfiedChecksinstead. (#2150, @trasc)
API Change
- Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads. (#1934, @mimowo)
- User can now pass parameters to ProvisioningRequest using job's annotations (#1869, @PBundyra)
Feature
-
A new condition with type Preempted allows to distinguish different reasons for the preemption to happen (#1942, @mimowo)
-
Add configuration to register Kinds as being managed by an external Kueue-compatible controller (#2059, @dgrove-oss)
-
Add fair sharing when borrowing unused resources from other ClusterQueues in a cohort.
Fair sharing is based on DRF for usage above nominal quotas.
When fair sharing is enabled, Kueue prefers to admit workloads from ClusterQueues with the lowest share first.
Administrators can enable and configure fair sharing preemption using a combination of two policies:LessThanOrEqualtoFinalShare,LessThanInitialShare.You can define a fair sharing
weightfor ClusterQueues. The weight determines how much of the unused resources each ClusterQueue can take in comparison to others. (#2070, @alculquicondor) -
Add metric
evicted_workloads: the number of evicted workloads per 'cluster_queue' (#1955, @lowang-bh) -
Add recommended Kubernetes labels to uniquely identify Pods and other resources installed with Kueue.
The Deployment selector remains unchanged to allow for a seamless upgrade. (#1695, @astefanutti) -
Added label copying from Pod/Job into the Kueue Workload. (#1959, @pajakd)
-
Added non-negative validations for the ".queueVisibility.clusterQueues.maxCount" in the Configuration. (#2309, @tenzen-y)
-
Added validations for the ".internalCertManagement" in the Configuration. (#2169, @tenzen-y)
-
Added validations for the "multiKueue.origin", ".multiKueue.gcInterval" and the "multiKueue.workerLostTimeout" in the Configuration. (#2129, @tenzen-y)
-
Added validations for the "waitForPodsReady.timeout" in the Configuration. (#2214, @tenzen-y)
-
Adds ObservedGeneration in conditions (#1939, @vladikkuzn)
-
Adds the
BackoffMaxSecondsproperty to limit the retry period length for re-queing workloads. (#2264, @IrvingMg) -
Allow for
workload.spec.podSet.[*].countto be 0 (#2268, @mszadkow) -
CLI: Add command to list ClusterQueues (#2156, @vladikkuzn)
-
CLI: Add commands to stop and Resume a ClusterQueue (#2200, @vladikkuzn)
-
CLI: Add kubectl kueue plugin that allows to create LocalQueues without writing yamls. (#2027, @mbobrovskyi)
-
CLI: Add list LocalQueue command (#2157, @mbobrovskyi)
-
CLI: Add stop/resume workload commands (#2134, @mbobrovskyi)
-
CLI: Add validation for ClusterQueue on creating LocalQueue (#2122, @mbobrovskyi)
-
CLI: Added list workloads command. (#2195, @mbobrovskyi)
-
CLI: Added pass-through commands support in
kubectl-kueueforget,describe,edit,patchanddelete. (#2181, @trasc) -
CLI: kubectl-kueue is part of the release artifacts (#2306, @mbobrovskyi)
-
Helm: Allow configuration of
ipFamilyPolicyfor ipDualStack kubernetes cluster (#1933, @dongjiang1989) -
Helm: Allow configuration of custom annotations on Service and Deployment's Pod (#2030, @tozastation)
-
Improve metrics related to workload's quota reservation and admission:
- fix admission_wait_time_seconds - to measure the time to "Admitted" condition since creation time or last requeue (as opposed to the "QuotaReserved" condition as before)
- add quota_reserved_wait_time_seconds - measures time to "QuotaReserved" condition since creation time, or last eviction time
- add quota_reserved_workloads_total - counts the number of workloads that got admitted
- admission_checks_wait_time_seconds - measures the time to admit a workload with admission checks since quota reservation
- use longer buckets (up to 10240s) for histogram metrics: admission_wait_time_seconds, quota_reserved_wait_time_seconds, admission_checks_wait_time_seconds (#1977, @mbobrovskyi)
-
Improve the kubectl output for workloads using admission checks. (#1991, @vladikkuzn)
-
Make the PodsReady base delay for requeuing configurable (#2040, @mimowo)
-
MuliKueue: Manage worker cluster unavailability (#1681, @trasc)
-
MultiKueue: Add support for JobSet
spec.managedByfield (#1870, @trasc) -
MultiKueue: Add the
managedByfield to JobSets assigned to a ClusterQueue configured for MultiKueue (#2048, @vladikkuzn) -
MultiKueue: Add worker connection monitoring and reconnect (#1806, @trasc)
-
Pod Integration: Add condition WaitingForReplacementPods to Workloads of pod groups with incomplete number of pods (#2234, @mbobrovskyi)
-
Pod Integration: The reason for stopping a pod is now specified in the pod
TerminationTargetcondition (#2160, @pajakd) -
Pods created by Kueue have now the ProvisioningRequest's classname annotation (#2052, @PBundyra)
-
ProvisioningRequest: Graduated to Beta and enabled by default (#1968, @pajakd)
-
ProvisioningRequest: Propagate the message for a ProvisioningRequest being provisioned (which might include an ETA, depending on the implementation) to the Workload status (#2007, @pajakd)
-
Show fair share of a CQ in status and a metric (#2276, @mbobrovskyi)
-
Updates in admission check messages are recorded as events for jobs/pods. (#2147, @pajakd)
-
Workload finished reason replaced with succeeded and failed reasons (#2026, @vladikkuzn)
-
You can configure Kueue to ignore container resources that match specified prefixes. (#2267, @pajakd)
-
You can define AdmissionChecks per ResourceFlavor in the ClusterQueue API, using
admissionChecksStrategy(#1960, @PBundyra)
Bug or Regression
-
Avoid unnecessary preemptions when there are multiple candidates for preemption with the same admission timestamp (#1875, @alculquicondor)
-
Change the default pprof port to 8083 to fix a bug that causes conflicting listening ports between pprof and the visibility server. (#2228, @amy)
-
Check the containers limits for used resources in provisioning admission check controller and include them in the ProvisioningRequest as requests (#2286, @trasc)
-
Do not default to suspending a job whose parent is already managed by Kueue (#1846, @astefanutti)
-
Fix handling of eviction in StrictFIFO to ensure the evicted workload is in the head.
Previously, in case of priority-based preemption, it was possible that the lower-priority
workload might get admitted while the higher priority workload is being evicted. (#2061, @mimowo) -
Fix incorrect quota management when lendingLimit enabled in preemption (#1770, @kerthcet)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible, and when using .preemption.borrowWithinCohort (#2110, @alculquicondor)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible. (#1979, @mimowo)
-
Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort. (#1866, @alculquicondor)
-
Fix support for MPIJobs when using a ProvisioningRequest engine that applies updates only to worker templates. (#2265, @trasc)
-
Fix the counter of pending workloads in cluster queue status.
The counter would not count the head workload for StrictFIFO queues, if the workload cannot get admitted.
This change also includes the blocked workload in the metrics and the visibility API for the list of pending workloads. (#1936, @mimowo)
-
Fix the resource requests computation taking into account sidecar containers. (#2099, @IrvingMg)
-
Helm: Fix a bug that prevented Kueue to work with the cert-manager. (#2087, @EladDolev)
-
Helm: Fix a bug where the configuration for
integrations.podOptions.namespaceSelectordidn't have an effect due to indentation issues. (#2086, @EladDolev) -
Helm: Fix chart values configuration for the number of reconcilers for the Pod integration. (#2046, @alculquicondor)
-
Kueue visibility API is no longer installed by default. Users can install it via helm or applying the visibility-api.yaml artifact. (#1746, @trasc)
-
Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
the couple of first requeues made the impression as immediate on users (below 10s, which
is negligible to the wait time spent waiting for PodsReady).The defaults values for the formula to determine the exponential back are changed as follows:
- base
1s -> 10s - exponent:...
- base
Kueue v0.7.0-rc.2
Changes since v0.6.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
-
Added CRD validation rules to AdmissionCheck.
-
Added CRD validation rules to ClusterQueue.
-
Added CRD validation rules to LocalQueue.
-
Added CRD validation rules to ResourceFlavor.
-
Added CRD validation rules to Workload.
-
Increased the default value in the
.waitForPodsReady.requeuingStrategy.backoffBaseSecondsto 60You can configure
.waitForPodsReady.requeuingStrategy.backoffBaseSecondsas needed. (#2251, @mbobrovskyi) -
Upgrade RayJob API to v1
If you use KubeRay older than v1.0.0, you'll have to upgrade your existing installation
to KubeRay v1.0.0, or any more recent version, that supports KubeRay v1 APIs, for it to
remain compatible with Kueue. (#1802, @astefanutti) -
When using admission checks, and they are not satisfied yet, the reason for the Admission condition with status=False is now
UnsatisfiedChecksIf you were watching for the reason
NoChecksin the Admitted condition, useUnsatisfiedChecksinstead. (#2150, @trasc)
API Change
- Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads. (#1934, @mimowo)
- User can now pass parameters to ProvisioningRequest using job's annotations (#1869, @PBundyra)
Feature
-
A new condition with type Preempted allows to distinguish different reasons for the preemption to happen (#1942, @mimowo)
-
Add configuration to register Kinds as being managed by an external Kueue-compatible controller (#2059, @dgrove-oss)
-
Add fair sharing when borrowing unused resources from other ClusterQueues in a cohort.
Fair sharing is based on DRF for usage above nominal quotas.
When fair sharing is enabled, Kueue prefers to admit workloads from ClusterQueues with the lowest share first.
Administrators can enable and configure fair sharing preemption using a combination of two policies:LessThanOrEqualtoFinalShare,LessThanInitialShare.You can define a fair sharing
weightfor ClusterQueues. The weight determines how much of the unused resources each ClusterQueue can take in comparison to others. (#2070, @alculquicondor) -
Add metric
evicted_workloads: the number of evicted workloads per 'cluster_queue' (#1955, @lowang-bh) -
Add recommended Kubernetes labels to uniquely identify Pods and other resources installed with Kueue.
The Deployment selector remains unchanged to allow for a seamless upgrade. (#1695, @astefanutti) -
Added label copying from Pod/Job into the Kueue Workload. (#1959, @pajakd)
-
Added non-negative validations for the ".queueVisibility.clusterQueues.maxCount" in the Configuration. (#2309, @tenzen-y)
-
Added validations for the ".internalCertManagement" in the Configuration. (#2169, @tenzen-y)
-
Added validations for the "multiKueue.origin", ".multiKueue.gcInterval" and the "multiKueue.workerLostTimeout" in the Configuration. (#2129, @tenzen-y)
-
Added validations for the "waitForPodsReady.timeout" in the Configuration. (#2214, @tenzen-y)
-
Adds ObservedGeneration in conditions (#1939, @vladikkuzn)
-
Adds the
BackoffMaxSecondsproperty to limit the retry period length for re-queing workloads. (#2264, @IrvingMg) -
Allow for
workload.spec.podSet.[*].countto be 0 (#2268, @mszadkow) -
CLI: Add command to list ClusterQueues (#2156, @vladikkuzn)
-
CLI: Add commands to stop and Resume a ClusterQueue (#2200, @vladikkuzn)
-
CLI: Add kubectl kueue plugin that allows to create LocalQueues without writing yamls. (#2027, @mbobrovskyi)
-
CLI: Add list LocalQueue command (#2157, @mbobrovskyi)
-
CLI: Add stop/resume workload commands (#2134, @mbobrovskyi)
-
CLI: Add validation for ClusterQueue on creating LocalQueue (#2122, @mbobrovskyi)
-
CLI: Added list workloads command. (#2195, @mbobrovskyi)
-
CLI: Added pass-through commands support in
kubectl-kueueforget,describe,edit,patchanddelete. (#2181, @trasc) -
Helm: Allow configuration of
ipFamilyPolicyfor ipDualStack kubernetes cluster (#1933, @dongjiang1989) -
Helm: Allow configuration of custom annotations on Service and Deployment's Pod (#2030, @tozastation)
-
Improve metrics related to workload's quota reservation and admission:
- fix admission_wait_time_seconds - to measure the time to "Admitted" condition since creation time or last requeue (as opposed to the "QuotaReserved" condition as before)
- add quota_reserved_wait_time_seconds - measures time to "QuotaReserved" condition since creation time, or last eviction time
- add quota_reserved_workloads_total - counts the number of workloads that got admitted
- admission_checks_wait_time_seconds - measures the time to admit a workload with admission checks since quota reservation
- use longer buckets (up to 10240s) for histogram metrics: admission_wait_time_seconds, quota_reserved_wait_time_seconds, admission_checks_wait_time_seconds (#1977, @mbobrovskyi)
-
Improve the kubectl output for workloads using admission checks. (#1991, @vladikkuzn)
-
Make the PodsReady base delay for requeuing configurable (#2040, @mimowo)
-
MuliKueue: Manage worker cluster unavailability (#1681, @trasc)
-
MultiKueue: Add support for JobSet
spec.managedByfield (#1870, @trasc) -
MultiKueue: Add the
managedByfield to JobSets assigned to a ClusterQueue configured for MultiKueue (#2048, @vladikkuzn) -
MultiKueue: Add worker connection monitoring and reconnect (#1806, @trasc)
-
Pod Integration: Add condition WaitingForReplacementPods to Workloads of pod groups with incomplete number of pods (#2234, @mbobrovskyi)
-
Pod Integration: The reason for stopping a pod is now specified in the pod
TerminationTargetcondition (#2160, @pajakd) -
Pods created by Kueue have now the ProvisioningRequest's classname annotation (#2052, @PBundyra)
-
ProvisioningRequest: Graduated to Beta and enabled by default (#1968, @pajakd)
-
ProvisioningRequest: Propagate the message for a ProvisioningRequest being provisioned (which might include an ETA, depending on the implementation) to the Workload status (#2007, @pajakd)
-
Show fair share of a CQ in status and a metric (#2276, @mbobrovskyi)
-
Updates in admission check messages are recorded as events for jobs/pods. (#2147, @pajakd)
-
Workload finished reason replaced with succeeded and failed reasons (#2026, @vladikkuzn)
-
You can configure Kueue to ignore container resources that match specified prefixes. (#2267, @pajakd)
-
You can define AdmissionChecks per ResourceFlavor in the ClusterQueue API, using
admissionChecksStrategy(#1960, @PBundyra)
Bug or Regression
-
Avoid unnecessary preemptions when there are multiple candidates for preemption with the same admission timestamp (#1875, @alculquicondor)
-
Change the default pprof port to 8083 to fix a bug that causes conflicting listening ports between pprof and the visibility server. (#2228, @amy)
-
Check the containers limits for used resources in provisioning admission check controller and include them in the ProvisioningRequest as requests (#2286, @trasc)
-
Do not default to suspending a job whose parent is already managed by Kueue (#1846, @astefanutti)
-
Fix handling of eviction in StrictFIFO to ensure the evicted workload is in the head.
Previously, in case of priority-based preemption, it was possible that the lower-priority
workload might get admitted while the higher priority workload is being evicted. (#2061, @mimowo) -
Fix incorrect quota management when lendingLimit enabled in preemption (#1770, @kerthcet)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible, and when using .preemption.borrowWithinCohort (#2110, @alculquicondor)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible. (#1979, @mimowo)
-
Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort. (#1866, @alculquicondor)
-
Fix support for MPIJobs when using a ProvisioningRequest engine that applies updates only to worker templates. (#2265, @trasc)
-
Fix the counter of pending workloads in cluster queue status.
The counter would not count the head workload for StrictFIFO queues, if the workload cannot get admitted.
This change also includes the blocked workload in the metrics and the visibility API for the list of pending workloads. (#1936, @mimowo)
-
Fix the resource requests computation taking into account sidecar containers. (#2099, @IrvingMg)
-
Helm: Fix a bug that prevented Kueue to work with the cert-manager. (#2087, @EladDolev)
-
Helm: Fix a bug where the configuration for
integrations.podOptions.namespaceSelectordidn't have an effect due to indentation issues. (#2086, @EladDolev) -
Helm: Fix chart values configuration for the number of reconcilers for the Pod integration. (#2046, @alculquicondor)
-
Kueue visibility API is no longer installed by default. Users can install it via helm or applying the visibility-api.yaml artifact. (#1746, @trasc)
-
Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
the couple of first requeues made the impression as immediate on users (below 10s, which
is negligible to the wait time spent waiting for PodsReady).The defaults values for the formula to determine the exponential back are changed as follows:
- base
1s -> 10s - exponent:
1.41284738 -> 2
So, now the consecutive times to requeue a workload are...
- base
Kueue v0.6.3
Changes since v0.6.2:
Feature
- Improve the kubectl output for workloads using admission checks. (#2014, @vladikkuzn)
Bug or Regression
-
Change the default pprof port to 8083 to fix a bug that causes conflicting listening ports between pprof and the visibility server. (#2232, @amy)
-
Check the containers limits for used resources in provisioning admission check controller and include them in the ProvisioningRequest as requests (#2293, @trasc)
-
Consider deleted pods without
spec.nodeNameinactive and subject for pod replacement. (#2217, @trasc) -
Fix a bug that causes the reactivated Workload to be immediately deactivated even though it doesn't exceed the backoffLimit. (#2220, @tenzen-y)
-
Fix a bug that the ".waitForPodsReady.requeuingStrategy.backoffLimitCount" is ignored when the ".waitForPodsReady.requeuingStrategy.timestamp" is not set. (#2224, @tenzen-y)
-
Fix chart values configuration for the number of reconcilers for the Pod integration. (#2050, @alculquicondor)
-
Fix handling of eviction in StrictFIFO to ensure the evicted workload is in the head.
Previously, in case of priority-based preemption, it was possible that the lower-priority
workload might get admitted while the higher priority workload is being evicted. (#2081, @mimowo) -
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible, and when using .preemption.borrowWithinCohort (#2111, @alculquicondor)
-
Fix support for MPIJobs when using a ProvisioningRequest engine that applies updates only to worker templates. (#2281, @trasc)
-
Fix support for jobset v0.5.x (#2271, @alculquicondor)
-
Fix the resource requests computation taking into account sidecar containers. (#2159, @IrvingMg)
-
Helm Chart: Fix a bug that the kueue does not work with the cert-manager. (#2098, @EladDolev)
-
HelmChart: Fix a bug that the
integrations.podOptions.namespaceSelectoris not propagated. (#2095, @EladDolev) -
JobFramework: The eviction by inactivation mechanism was moved to the workload controller.
This fixes a problem where pod groups would remain with condition QuotaReserved set to True when replacement pods are missing. (#2229, @mbobrovskyi)
-
Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
the couple of first requeues made the impression as immediate on users (below 10s, which
is negligible to the wait time spent waiting for PodsReady).The defaults values for the formula to determine the exponential back are changed as follows:
-
MultiKueue: Fix a bug that could delay the joining clusters when it's MultiKueueCluster is created. (#2167, @trasc)
-
Prevent Pod from being deleted when admitted via ProvisioningRequest that has pod updates on tolerations (#2262, @vladikkuzn)
-
Use PATCH updates for pods. This fixes support for Pods when using the latest features in Kubernetes v1.29 (#2089, @mbobrovskyi)
Other (Cleanup or Flake)
Kueue v0.7.0-rc.1
Changes since v0.6.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
-
Added CRD validation rules to AdmissionCheck.
-
Added CRD validation rules to ClusterQueue.
Requires Kubernetes 1.25 or newer (#1972, @IrvingMg)
- Added CRD validation rules to ResourceFlavor.
Requires Kubernetes 1.25 or newer (#1958, @IrvingMg)
- Added CRD validation rules to Workload.
Requires Kubernetes 1.25 or newer (#2008, @IrvingMg)
- Replaced LocalQueue admission webhook with CRD validation rules.
Requires Kubernetes 1.25 or newer (#1938, @IrvingMg)
- Upgrade RayJob API to v1
If you use KubeRay older than v1.0.0, you'll have to upgrade your existing installation
to KubeRay v1.0.0, or any more recent version, that supports KubeRay v1 APIs, for it to
remain compatible with Kueue. (#1802, @astefanutti)
- Use recommended labels and a uniquely identifying selector for Kueue deployment resources.
You need to recreate the Kueue deployment if you had it previously installed,
as the label selector field is immutable. (#1695, @astefanutti)
Changes by Kind
API Change
- Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads. (#1934, @mimowo)
- User can now pass parameters to ProvisioningRequest using job's annotations (#1869, @PBundyra)
Feature
-
A new condition with type Preempted allows to distinguish different reasons for the preemption to happen (#1942, @mimowo)
-
Add MultiKueue support for JobSet
spec.managedByfield. (#1870, @trasc) -
Add configuration to register Kinds as being managed by an external Kueue-compatible controller (#2059, @dgrove-oss)
-
Add fair sharing when borrowing unused resources from other ClusterQueues in a cohort.
Fair sharing is based on DRF for usage above nominal quotas.
When fair sharing is enabled, Kueue prefers to admit workloads from ClusterQueues with the lowest share first.
Administrators can enable and configure fair sharing preemption using a combination of two policies:LessThanOrEqualtoFinalShare,LessThanInitialShare.You can define a fair sharing
weightfor ClusterQueues. The weight determines how much of the unused resources each ClusterQueue can take in comparison to others. (#2070, @alculquicondor) -
Add kubectl kueue plugin that allows to create LocalQueues without writing yamls. (#2027, @mbobrovskyi)
-
Add support allow configuration of
ipFamilyPolicyfor ipDualStack kubernetes cluster (#1933, @dongjiang1989) -
Add support allow configuration of custom annotations on Service and Deployment's Pod (#2030, @tozastation)
-
Added MultiKueue worker connection monitoring and reconnect. (#1806, @trasc)
-
Added label copying from Pod/Job into the Kueue Workload. (#1959, @pajakd)
-
Added scalability test for scheduling performance (#1931, @trasc)
-
Added validations for the "multiKueue.origin", ".multiKueue.gcInterval" and the "multiKueue.workerLostTimeout" in the Configuration. (#2129, @tenzen-y)
-
Adds ObservedGeneration in conditions (#1939, @vladikkuzn)
-
Improve metrics related to workload's quota reservation and admission:
- fix admission_wait_time_seconds - to measure the time to "Admitted" condition since creation time or last requeue (as opposed to the "QuotaReserved" condition as before)
- add quota_reserved_wait_time_seconds - measures time to "QuotaReserved" condition since creation time, or last eviction time
- add quota_reserved_workloads_total - counts the number of workloads that got admitted
- admission_checks_wait_time_seconds - measures the time to admit a workload with admission checks since quota reservation
- use longer buckets (up to 10240s) for histogram metrics: admission_wait_time_seconds, quota_reserved_wait_time_seconds, admission_checks_wait_time_seconds (#1977, @mbobrovskyi)
-
Improve the kubectl output for workloads using admission checks. (#1991, @vladikkuzn)
-
Make the PodsReady base delay for requeuing configurable (#2040, @mimowo)
-
MuliKueue - Manage worker cluster unavailability (#1681, @trasc)
-
Pods created by Kueue have now the ProvisioningRequest's classname annotation (#2052, @PBundyra)
-
Provisioning Admission Check Controller (ProvisioningACC) feature is now enabled by default (#1968, @pajakd)
-
The message for a ProvisioningRequest being provisioned (which might include an ETA, depending on the implementation) is now propagated to workloads. (#2007, @pajakd)
-
Use PATCH updates for pods. This fixes support for Pods when using the latest features in Kubernetes v1.29 (#2074, @mbobrovskyi)
-
Users can define AdmissionChecks per ResourceFlavor in the ClusterQueue API, using admissionChecksStrategy. (#1960, @PBundyra)
-
Workload finished reason replaced with succeeded and failed reasons (#2026, @vladikkuzn)
Bug or Regression
-
Avoid unnecessary preemptions when there are multiple candidates for preemption with the same admission timestamp (#1875, @alculquicondor)
-
Do not default to suspending a job whose parent is already managed by Kueue (#1846, @astefanutti)
-
Exclude Pod labels, preemptionPolicy and container images when determining whether pods in a pod group have the same shape. (#1758, @alculquicondor)
-
Fix Pods in Pod groups stuck with finalizers when deleted immediately after Succeeded (#1905, @alculquicondor)
-
Fix chart values configuration for the number of reconcilers for the Pod integration. (#2046, @alculquicondor)
-
Fix handling of eviction in StrictFIFO to ensure the evicted workload is in the head.
Previously, in case of priority-based preemption, it was possible that the lower-priority
workload might get admitted while the higher priority workload is being evicted. (#2061, @mimowo) -
Fix incorrect quota management when lendingLimit enabled in preemption (#1770, @kerthcet)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible, and when using .preemption.borrowWithinCohort (#2110, @alculquicondor)
-
Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible. (#1979, @mimowo)
-
Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort. (#1866, @alculquicondor)
-
Fix the configuration for the number of reconcilers for the Pod integration. It was only reconciling one group at a time. (#1835, @alculquicondor)
-
Fix the counter of pending workloads in cluster queue status.
The counter would not count the head workload for StrictFIFO queues, if the workload cannot get admitted.
This change also includes the blocked workload in the metrics and the visibility API for the list of pending workloads. (#1936, @mimowo)
-
Fix the resource requests computation taking into account sidecar containers. (#2099, @IrvingMg)
-
Fix transitions of Requeued condition. (#2063, @mbobrovskyi)
-
Helm Chart: Fix a bug that the kueue does not work with the cert-manager. (#2087, @EladDolev)
-
HelmChart: Fix a bug that the
integrations.podOptions.namespaceSelectoris not propagated. (#2086, @EladDolev) -
Kueue visibility API is no longer installed by default. Users can install it via helm or applying the visibility-api.yaml artifact. (#1746, @trasc)
-
Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
the couple of first requeues made the impression as immediate on users (below 10s, which
is negligible to the wait time spent waiting for PodsReady).The defaults values for the formula to determine the exponential back are changed as follows:
-
Reduce number of Workload reconciliations due to wrong equality check. (#1897, @gabesaba)
-
The Failed pods in a pod-group are finalized once a replacement pods are created. (#1766, @trasc)
-
WaitForPodsReady: Fix a bug that the requeueState isn't reset. (#1838, @tenzen-y)
-
Сlear RequeuAt on workload backoff finished. (#2143, @mbobrovskyi)
Other (Cleanup or Flake)
- Avoid API calls for admission attempts when Workload already has condition Admitted=false (#1820, @alculquicondor)
- Correctly log workload status for workloads with quota reserved, but awaiting for admission checks. (#2062, @mimowo)
- Dropped the usage of
kueue.x-k8s.io/parent-workloadannotation in favor of an object ownership based approach. (#1747, @trasc) - JobFramework: The eviction by inactivation mechanism was moved to the workload controller. (#2131, @tenzen-y)
- Skip requeueing of Workloads when there is a status update for a ClusterQueue, saving on API calls for Workloads that were already attempted for admission. (#1822, @alculquicondor)
- The hash suffix of the workload's name are now influenced by the job's object UID. Recreated jobs with the same name and kind will use different workload names. (#1732, @trasc)
Kueue v0.6.2
Changes since v0.6.1:
Bug or Regression
- Avoid unnecessary preemptions when there are multiple candidates for preemption with the same admission timestamp (#1880, @alculquicondor)
- Fix Pods in Pod groups stuck with finalizers when deleted immediately after Succeeded (#1916, @alculquicondor)
- Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort. (#1868, @alculquicondor)
- Reduce number of Workload reconciliations due to wrong equality check. (#1917, @gabesaba)