Skip to content

Commit 7e47a65

Browse files
Merge branch 'main' into release-0.2
Change-Id: I47993710ef423009836f848680240078a1b9fe34
2 parents ddadb5b + 8016971 commit 7e47a65

File tree

16 files changed

+153
-143
lines changed

16 files changed

+153
-143
lines changed

CHANGELOG/CHANGELOG-0.2.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,32 @@
33
Changes since `v0.1.0`:
44

55
### Features
6-
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
7-
- Add webhooks to validate and add defaults to all kueue APIs.
6+
7+
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
8+
v1alpha2 includes the following changes:
9+
- Rename Queue to LocalQueue.
10+
- Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
11+
- Add webhooks to validate and to add defaults to all kueue APIs.
12+
- Add internal cert manager to serve webhooks with TLS.
13+
- Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
14+
deleted prematurely.
815
- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
916
by assigning the same flavor to codependent resources in a pod set.
1017
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
1118
in Workload pod sets.
12-
- Default requests to limits if requests are not set in a Workload pod set, to
13-
match internal defaulting for k8s Pods.
14-
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
19+
- Set requests to limits if requests are not set in a Workload pod set,
20+
matching [internal defaulting for k8s Pods](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
21+
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of
1522
the system and the status of ClusterQueues.
23+
- Use Server Side Apply for Workload admission to reduce API conflicts.
1624

1725
### Bug fixes
1826

19-
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
20-
blocking other Workloads in a StrictFIFO ClusterQueue.
21-
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
22-
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
27+
- Fix bug that caused Workloads that don't match the ClusterQueue's
28+
namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
29+
- Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
30+
- Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
2331
retried after a transient error.
24-
- Fixed requeuing an out-of-date workload when failed to admit it.
25-
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
32+
- Fix requeuing an out-of-date workload when failed to admit it.
33+
- Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
2634
were not removed from the ClusterQueue when removing the corresponding Queue.

README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,15 @@ created) and when it should stop (as in active pods should be deleted).
88
## Why use Kueue
99

1010
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
11-
cluster without replacing any components. It is compatible with cloud
12-
environments where:
13-
- Nodes and other compute resources can be scaled up and down.
11+
cluster. Kueue does not replace any existing Kubernetes components. Kueue is
12+
compatible with cloud environments where:
13+
- Compute resources are elastic and can be scaled up and down.
1414
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
1515

1616
Kueue APIs allow you to express:
1717
- Quotas and policies for fair sharing among tenants.
1818
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
19-
is fully utilized, run the [job](docs/concepts/workload.md) using a different
20-
flavor.
19+
is fully utilized, Kueue can admit the job using a different flavor.
2120

2221
The main design principle for Kueue is to avoid duplicating mature functionality
2322
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
@@ -62,11 +61,11 @@ Learn more about:
6261

6362
<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
6463

65-
Learn more about the architecture of Kueue in the design docs:
64+
Learn more about the architecture of Kueue with the following design docs:
6665

67-
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
68-
to get access) discusses the API proposal and a high-level description of how it
69-
operates.
66+
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high
67+
level description of how Kueue operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
68+
to get document access.
7069
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
7170
presents the detailed design of the controller.
7271

apis/kueue/v1alpha2/resourceflavor_types.go

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,15 @@ import (
2424
//+kubebuilder:object:root=true
2525
//+kubebuilder:resource:scope=Cluster
2626

27-
// ResourceFlavor is the Schema for the resourceflavors API
27+
// ResourceFlavor is the Schema for the resourceflavors API.
28+
//
29+
// .metadata.labels associated with this flavor are matched against or
30+
// converted to node affinity constraints on the workload’s pods.
31+
// .metadata.labels can be up to 8 elements.
2832
type ResourceFlavor struct {
2933
metav1.TypeMeta `json:",inline"`
3034
metav1.ObjectMeta `json:"metadata,omitempty"`
3135

32-
// labels associated with this flavor. They are matched against or
33-
// converted to node affinity constraints on the workload’s pods.
34-
// For example, cloud.provider.com/accelerator: nvidia-tesla-k80.
35-
// More info: http://kubernetes.io/docs/user-guide/labels
36-
//
37-
// labels can be up to 8 elements.
38-
Labels map[string]string `json:"labels,omitempty"`
39-
4036
// taints associated with this flavor that workloads must explicitly
4137
// “tolerate” to be able to use this flavor.
4238
// For example, cloud.provider.com/preemptible="true":NoSchedule

apis/kueue/v1alpha2/zz_generated.deepcopy.go

Lines changed: 0 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

apis/kueue/webhooks/resourceflavor_webhook.go

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,11 +88,9 @@ func (w *ResourceFlavorWebhook) ValidateDelete(ctx context.Context, obj runtime.
8888
func ValidateResourceFlavor(rf *kueue.ResourceFlavor) field.ErrorList {
8989
var allErrs field.ErrorList
9090

91-
labelsPath := field.NewPath("labels")
9291
if len(rf.Labels) > 8 {
93-
allErrs = append(allErrs, field.TooMany(labelsPath, len(rf.Labels), 8))
92+
allErrs = append(allErrs, field.TooMany(field.NewPath("metadata", "labels"), len(rf.Labels), 8))
9493
}
95-
allErrs = append(allErrs, metavalidation.ValidateLabels(rf.Labels, labelsPath)...)
9694

9795
taintsPath := field.NewPath("taints")
9896
if len(rf.Taints) > 8 {

apis/kueue/webhooks/resourceflavor_webhook_test.go

Lines changed: 1 addition & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -50,24 +50,6 @@ func TestValidateResourceFlavor(t *testing.T) {
5050
Effect: corev1.TaintEffectNoSchedule,
5151
}).Obj(),
5252
},
53-
{
54-
name: "invalid label name",
55-
rf: utiltesting.MakeResourceFlavor("resource-flavor").MultiLabels(map[string]string{
56-
"foo@bar": "",
57-
}).Obj(),
58-
wantErr: field.ErrorList{
59-
field.Invalid(field.NewPath("labels"), nil, ""),
60-
},
61-
},
62-
{
63-
name: "invalid label value",
64-
rf: utiltesting.MakeResourceFlavor("resource-flavor").MultiLabels(map[string]string{
65-
"foo": "@abcdefg",
66-
}).Obj(),
67-
wantErr: field.ErrorList{
68-
field.Invalid(field.NewPath("labels"), nil, ""),
69-
},
70-
},
7153
{
7254
// Taint validation is not exhaustively tested, because the code was copied from upstream k8s.
7355
name: "invalid taint",
@@ -88,7 +70,7 @@ func TestValidateResourceFlavor(t *testing.T) {
8870
return m
8971
}()).Obj(),
9072
wantErr: field.ErrorList{
91-
field.TooMany(field.NewPath("labels"), 9, 8),
73+
field.TooMany(field.NewPath("metadata", "labels"), 9, 8),
9274
},
9375
},
9476
{

config/components/crd/bases/kueue.x-k8s.io_resourceflavors.yaml

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,10 @@ spec:
1818
- name: v1alpha2
1919
schema:
2020
openAPIV3Schema:
21-
description: ResourceFlavor is the Schema for the resourceflavors API
21+
description: "ResourceFlavor is the Schema for the resourceflavors API. \n
22+
.metadata.labels associated with this flavor are matched against or converted
23+
to node affinity constraints on the workload’s pods. .metadata.labels can
24+
be up to 8 elements."
2225
properties:
2326
apiVersion:
2427
description: 'APIVersion defines the versioned schema of this representation
@@ -30,15 +33,6 @@ spec:
3033
object represents. Servers may infer this from the endpoint the client
3134
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
3235
type: string
33-
labels:
34-
additionalProperties:
35-
type: string
36-
description: "labels associated with this flavor. They are matched against
37-
or converted to node affinity constraints on the workload’s pods. For
38-
example, cloud.provider.com/accelerator: nvidia-tesla-k80. More info:
39-
http://kubernetes.io/docs/user-guide/labels \n labels can be up to 8
40-
elements."
41-
type: object
4236
metadata:
4337
type: object
4438
taints:

docs/concepts/cluster_queue.md

Lines changed: 43 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Cluster Queue
22

33
A ClusterQueue is a cluster-scoped object that governs a pool of resources
4-
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
5-
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
6-
limits and order of consumption.
4+
such as CPU, memory, and hardware accelerators. A ClusterQueue defines:
5+
- The [resource _flavors_](#resourceflavor-object) that the ClusterQueue manages,
6+
with usage limits and order of consumption.
77
- Fair sharing rules across the tenants of the cluster.
88

99
Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
@@ -39,29 +39,29 @@ You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/k
3939
## Resources
4040
4141
In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
42-
(cpu, memory, GPUs, etc.).
42+
(CPU, memory, GPUs, etc.).
4343
44-
For each resource, you can define quotas for multiple _flavors_. A
45-
flavor represents different variations of a resource. The variations can be
46-
defined in a [ResourceFlavor object](#resourceflavor-object).
44+
For each resource, you can define quotas for multiple _flavors_.
45+
Flavors represent different variations of a resource (for example, different GPU
46+
models). A flavor is defined using a [ResourceFlavor object](#resourceflavor-object).
4747
48-
In a process called [admission](.#admission), Kueue assigns
49-
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
48+
In a process called [admission](.#admission), Kueue assigns to the
49+
[Workload pod sets](workload.md#pod-sets) a flavor for each resource the pod set
50+
requests.
5051
Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
5152
list that has enough unused `min` quota in the ClusterQueue or the
5253
ClusterQueue's [cohort](#cohort).
5354

5455
### Codepedent resources
5556

56-
It is possible that multiple resources are tied to the same flavors. This is
57-
typical for `cpu` and `memory`, where the flavors are generally tied to a
58-
machine family or availability guarantees.
57+
It is possible that multiple resources in a ClusterQueue have the same flavors.
58+
This is typical for `cpu` and `memory`, where the flavors are generally tied to
59+
a machine family or VM availability policies. When two or more resources in a
60+
ClusterQueue match their flavors, they are said to be codependent resources.
5961

60-
If this is the case, the resources in the ClusterQueue must list the same
61-
flavors in the same order. When two or more resources match their flavors,
62-
they are said to be codependent. During admission, for each pod set in a
63-
Workload, Kueue assigns the same flavor to the codependent resources that the
64-
pod set requests.
62+
To manage codependent resources, you should list the flavors in the ClusterQueue
63+
resources in the same order. During admission, for each pod set in a Workload,
64+
Kueue assigns the same flavor to the codependent resources that the pod set requests.
6565

6666
An example of a ClusterQueue with codependent resources looks like the following:
6767

@@ -150,8 +150,8 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
150150
- architecture (ex: x86 vs ARM CPUs)
151151
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)
152152

153-
A ResourceFlavor is an object that represents these variations and allows you
154-
to associate them with node labels and taints.
153+
A ResourceFlavor is an object that represents these resource variations and
154+
allows you to associate them with node labels and taints.
155155

156156
**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
157157
instead of adding labels to custom ResourceFlavors.
@@ -163,8 +163,8 @@ apiVersion: kueue.x-k8s.io/v1alpha1
163163
kind: ResourceFlavor
164164
metadata:
165165
name: spot
166-
labels:
167-
instance-type: spot
166+
labels:
167+
instance-type: spot
168168
taints:
169169
- effect: NoSchedule
170170
key: spot
@@ -177,7 +177,7 @@ ClusterQueue in the `.spec.resources[*].flavors[*].name` field.
177177
### ResourceFlavor labels
178178

179179
To associate a ResourceFlavor with a subset of nodes of you cluster, you can
180-
configure the `.labels` field with matching node labels that uniquely identify
180+
configure the `.metadata.labels` field with matching node labels that uniquely identify
181181
the nodes. If you are using [cluster autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
182182
(or equivalent controllers), make sure it is configured to add those labels when
183183
adding new nodes.
@@ -197,8 +197,8 @@ steps:
197197

198198
For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
199199
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
200-
guarantees that the workload Pods run on the nodes associated to the flavor
201-
that Kueue decided that the workload should use.
200+
guarantees that the Workload's Pods can only be scheduled on the nodes
201+
targeted by the flavor that Kueue assigned to the Workload.
202202

203203
### ResourceFlavor taints
204204

@@ -208,7 +208,7 @@ with taints.
208208
Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
209209
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
210210
workload should have a toleration for it. As opposed to the behavior for
211-
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
211+
[ResourceFlavor labels](#resourceflavor-labels), Kueue does not add tolerations
212212
for the flavor taints.
213213

214214
### Empty ResourceFlavor
@@ -238,16 +238,27 @@ ClusterQueue.
238238

239239
### Flavors and borrowing semantics
240240

241-
When borrowing, Kueue satisfies the following admission semantics:
241+
When a ClusterQueue is part of a cohort, Kueue satisfies the following admission
242+
semantics:
242243

243244
- When assigning flavors, Kueue goes through the list of flavors in the
244245
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
245-
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
246-
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
247-
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
248-
flavor in the list.
249-
- A ClusterQueue can only borrow quota of flavors it defines and it can only
250-
borrow quota for one flavor.
246+
to fit a Workload's pod set according to the quota defined in the
247+
ClusterQueue for the flavor and the unused quota in the cohort.
248+
If the workload doesn't fit, Kueue evaluates the next flavor in the list.
249+
- A Workload's pod set resource fits in a flavor defined for a ClusterQueue
250+
resource if the sum of requests for the resource:
251+
1. Is less than or equal to the unused `.quota.min` for the flavor in the
252+
ClusterQueue; or
253+
2. Is less than or equal to the sum of unused `.quota.min` for the flavor in
254+
the ClusterQueues in the cohort, and
255+
3. Is less than or equal to the unused `.quota.max` for the flavor in the
256+
ClusterQueue.
257+
In Kueue, when (2) and (3) are satisfied, but not (1), this is called
258+
_borrowing quota_.
259+
- A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines.
260+
- For each pod set resource in a Workload, a ClusterQueue can only borrow quota
261+
for one flavor.
251262

252263
### Borrowing example
253264

docs/concepts/local_queue.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ A `LocalQueue` is a namespaced object that groups closely related workloads
44
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
55
from which resources are allocated to run its workloads.
66

7-
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
7+
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly.
88
Tenants can discover which queues they can submit jobs to by listing the
9-
local queues in their namespace. The command looks similar to the following:
9+
local queues in their namespace. The command is similar to the following:
1010

1111
```sh
1212
kubectl get -n my-namespace localqueues

docs/setup/install.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VE
5050

5151
### Upgrading from 0.1 to 0.2
5252

53-
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
53+
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API
54+
changes.
5455
To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.
5556

5657
## Install a custom-configured released version

0 commit comments

Comments
 (0)