Skip to content

Commit 8016971

Browse files
Recover updates to docs and release notes (#356)
* Recover updates to docs and release notes * Add more features to release notes Change-Id: I1783a3e890da9a0599b83452e77d956e58d83ec6 * Better wording Change-Id: Idb8737e5ba5f55863438d601743a15e5967f6ea1
1 parent c87be91 commit 8016971

File tree

5 files changed

+71
-52
lines changed

5 files changed

+71
-52
lines changed

CHANGELOG/CHANGELOG-0.2.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,32 @@
33
Changes since `v0.1.0`:
44

55
### Features
6-
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
7-
- Add webhooks to validate and add defaults to all kueue APIs.
6+
7+
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
8+
v1alpha2 includes the following changes:
9+
- Rename Queue to LocalQueue.
10+
- Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
11+
- Add webhooks to validate and to add defaults to all kueue APIs.
12+
- Add internal cert manager to serve webhooks with TLS.
13+
- Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
14+
deleted prematurely.
815
- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
916
by assigning the same flavor to codependent resources in a pod set.
1017
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
1118
in Workload pod sets.
12-
- Default requests to limits if requests are not set in a Workload pod set, to
13-
match internal defaulting for k8s Pods.
14-
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
19+
- Set requests to limits if requests are not set in a Workload pod set,
20+
matching [internal defaulting for k8s Pods](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
21+
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of
1522
the system and the status of ClusterQueues.
23+
- Use Server Side Apply for Workload admission to reduce API conflicts.
1624

1725
### Bug fixes
1826

19-
- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
20-
blocking other Workloads in a StrictFIFO ClusterQueue.
21-
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
22-
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
27+
- Fix bug that caused Workloads that don't match the ClusterQueue's
28+
namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
29+
- Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
30+
- Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
2331
retried after a transient error.
24-
- Fixed requeuing an out-of-date workload when failed to admit it.
25-
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
32+
- Fix requeuing an out-of-date workload when failed to admit it.
33+
- Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
2634
were not removed from the ClusterQueue when removing the corresponding Queue.

README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,15 @@ created) and when it should stop (as in active pods should be deleted).
88
## Why use Kueue
99

1010
Kueue is a lean controller that you can install on top of a vanilla Kubernetes
11-
cluster without replacing any components. It is compatible with cloud
12-
environments where:
13-
- Nodes and other compute resources can be scaled up and down.
11+
cluster. Kueue does not replace any existing Kubernetes components. Kueue is
12+
compatible with cloud environments where:
13+
- Compute resources are elastic and can be scaled up and down.
1414
- Compute resources are heterogeneous (in architecture, availability, price, etc.).
1515

1616
Kueue APIs allow you to express:
1717
- Quotas and policies for fair sharing among tenants.
1818
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
19-
is fully utilized, run the [job](docs/concepts/workload.md) using a different
20-
flavor.
19+
is fully utilized, Kueue can admit the job using a different flavor.
2120

2221
The main design principle for Kueue is to avoid duplicating mature functionality
2322
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
@@ -62,11 +61,11 @@ Learn more about:
6261

6362
<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
6463

65-
Learn more about the architecture of Kueue in the design docs:
64+
Learn more about the architecture of Kueue with the following design docs:
6665

67-
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
68-
to get access) discusses the API proposal and a high-level description of how it
69-
operates.
66+
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high
67+
level description of how Kueue operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
68+
to get document access.
7069
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
7170
presents the detailed design of the controller.
7271

docs/concepts/cluster_queue.md

Lines changed: 40 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Cluster Queue
22

33
A ClusterQueue is a cluster-scoped object that governs a pool of resources
4-
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
5-
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
6-
limits and order of consumption.
4+
such as CPU, memory, and hardware accelerators. A ClusterQueue defines:
5+
- The [resource _flavors_](#resourceflavor-object) that the ClusterQueue manages,
6+
with usage limits and order of consumption.
77
- Fair sharing rules across the tenants of the cluster.
88

99
Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
@@ -39,29 +39,29 @@ You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/k
3939
## Resources
4040
4141
In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
42-
(cpu, memory, GPUs, etc.).
42+
(CPU, memory, GPUs, etc.).
4343
44-
For each resource, you can define quotas for multiple _flavors_. A
45-
flavor represents different variations of a resource. The variations can be
46-
defined in a [ResourceFlavor object](#resourceflavor-object).
44+
For each resource, you can define quotas for multiple _flavors_.
45+
Flavors represent different variations of a resource (for example, different GPU
46+
models). A flavor is defined using a [ResourceFlavor object](#resourceflavor-object).
4747
48-
In a process called [admission](.#admission), Kueue assigns
49-
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
48+
In a process called [admission](.#admission), Kueue assigns to the
49+
[Workload pod sets](workload.md#pod-sets) a flavor for each resource the pod set
50+
requests.
5051
Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
5152
list that has enough unused `min` quota in the ClusterQueue or the
5253
ClusterQueue's [cohort](#cohort).
5354

5455
### Codepedent resources
5556

56-
It is possible that multiple resources are tied to the same flavors. This is
57-
typical for `cpu` and `memory`, where the flavors are generally tied to a
58-
machine family or availability guarantees.
57+
It is possible that multiple resources in a ClusterQueue have the same flavors.
58+
This is typical for `cpu` and `memory`, where the flavors are generally tied to
59+
a machine family or VM availability policies. When two or more resources in a
60+
ClusterQueue match their flavors, they are said to be codependent resources.
5961

60-
If this is the case, the resources in the ClusterQueue must list the same
61-
flavors in the same order. When two or more resources match their flavors,
62-
they are said to be codependent. During admission, for each pod set in a
63-
Workload, Kueue assigns the same flavor to the codependent resources that the
64-
pod set requests.
62+
To manage codependent resources, you should list the flavors in the ClusterQueue
63+
resources in the same order. During admission, for each pod set in a Workload,
64+
Kueue assigns the same flavor to the codependent resources that the pod set requests.
6565

6666
An example of a ClusterQueue with codependent resources looks like the following:
6767

@@ -150,8 +150,8 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
150150
- architecture (ex: x86 vs ARM CPUs)
151151
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)
152152

153-
A ResourceFlavor is an object that represents these variations and allows you
154-
to associate them with node labels and taints.
153+
A ResourceFlavor is an object that represents these resource variations and
154+
allows you to associate them with node labels and taints.
155155

156156
**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
157157
instead of adding labels to custom ResourceFlavors.
@@ -197,8 +197,8 @@ steps:
197197

198198
For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
199199
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
200-
guarantees that the workload Pods run on the nodes associated to the flavor
201-
that Kueue decided that the workload should use.
200+
guarantees that the Workload's Pods can only be scheduled on the nodes
201+
targeted by the flavor that Kueue assigned to the Workload.
202202

203203
### ResourceFlavor taints
204204

@@ -208,7 +208,7 @@ with taints.
208208
Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
209209
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
210210
workload should have a toleration for it. As opposed to the behavior for
211-
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
211+
[ResourceFlavor labels](#resourceflavor-labels), Kueue does not add tolerations
212212
for the flavor taints.
213213

214214
### Empty ResourceFlavor
@@ -238,16 +238,27 @@ ClusterQueue.
238238

239239
### Flavors and borrowing semantics
240240

241-
When borrowing, Kueue satisfies the following admission semantics:
241+
When a ClusterQueue is part of a cohort, Kueue satisfies the following admission
242+
semantics:
242243

243244
- When assigning flavors, Kueue goes through the list of flavors in the
244245
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
245-
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
246-
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
247-
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
248-
flavor in the list.
249-
- A ClusterQueue can only borrow quota of flavors it defines and it can only
250-
borrow quota for one flavor.
246+
to fit a Workload's pod set according to the quota defined in the
247+
ClusterQueue for the flavor and the unused quota in the cohort.
248+
If the workload doesn't fit, Kueue evaluates the next flavor in the list.
249+
- A Workload's pod set resource fits in a flavor defined for a ClusterQueue
250+
resource if the sum of requests for the resource:
251+
1. Is less than or equal to the unused `.quota.min` for the flavor in the
252+
ClusterQueue; or
253+
2. Is less than or equal to the sum of unused `.quota.min` for the flavor in
254+
the ClusterQueues in the cohort, and
255+
3. Is less than or equal to the unused `.quota.max` for the flavor in the
256+
ClusterQueue.
257+
In Kueue, when (2) and (3) are satisfied, but not (1), this is called
258+
_borrowing quota_.
259+
- A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines.
260+
- For each pod set resource in a Workload, a ClusterQueue can only borrow quota
261+
for one flavor.
251262

252263
### Borrowing example
253264

docs/concepts/local_queue.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ A `LocalQueue` is a namespaced object that groups closely related workloads
44
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
55
from which resources are allocated to run its workloads.
66

7-
Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
7+
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly.
88
Tenants can discover which queues they can submit jobs to by listing the
9-
local queues in their namespace. The command looks similar to the following:
9+
local queues in their namespace. The command is similar to the following:
1010

1111
```sh
1212
kubectl get -n my-namespace localqueues

docs/setup/install.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VE
5050

5151
### Upgrading from 0.1 to 0.2
5252

53-
Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
53+
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API
54+
changes.
5455
To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.
5556

5657
## Install a custom-configured released version

0 commit comments

Comments
 (0)