Skip to content

Commit c43c5d7

Browse files
committed
KEP-3008: update
- update k8s version in kep.yaml - update the container runtimes section, mention NRI API - add Cluster autoscaler in the future work - Motivation: mention NRI API and a small rewording - clarify goals - small updates on user stories
1 parent 2cbaa02 commit c43c5d7

File tree

2 files changed

+39
-34
lines changed

2 files changed

+39
-34
lines changed

keps/sig-node/3008-qos-class-resources/README.md

+37-32
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,9 @@ resources.
221221
This KEP identifies two technologies that can immediately be enabled with
222222
QoS-class resources. However, these are just two examples and the proposed
223223
changes are generic (and not tied to these two QoS-class resource types in any
224-
way), making it easier to implement new QoS-class resource types.
224+
way), making it easier to implement new QoS-class resource types. For example,
225+
the [NRI API][nri-api] would be good mechanism to implement new QoS-class
226+
resources.
225227

226228
[Intel RDT][intel-rdt] implements a class-based mechanism for controlling the
227229
cache and memory bandwidth QoS of applications. All processes in the same
@@ -251,28 +253,24 @@ annotations on a Kubernetes Pod. The goal of this KEP is to get these types of
251253
resources first class citizens and properly supported in Kubernetes, providing
252254
visibility, a well-defined user interface, and permission controls.
253255

254-
255-
256-
We can identify two types, container-level and pod-level QoS-class resources.
257-
Container-level resources enable QoS on per-container granularity, for example
258-
container-level cgroups in Linux or cache and memory bandwidth control
259-
technologies. Examples for pod-level QoS include e.g. pod-level cgroups or
260-
network QoS that cannot support per-container granularity.
256+
Two types of QoS-class resources are identified, container-level and pod-level
257+
QoS-class resources. Container-level resources enable QoS on per-container
258+
granularity, for example container-level cgroups in Linux or cache and memory
259+
bandwidth control technologies. Examples for pod-level QoS include e.g.
260+
pod-level cgroups or network QoS that cannot support per-container granularity.
261261

262262
### Goals
263263

264-
- Make it possible to request QoS-class resources
265-
- Support RDT class assignment of containers. This is already supported by
266-
the containerd and CRI-O runtime and part of the OCI runtime-spec
267-
- Support blockio class assignment of containers.
268-
- Support Pod-level (sandbox-level) QoS-class resources
269-
- Make the API to support updating QoS-class resource assignment of running containers
270-
- Make the extensions flexible, enabling simple addition of other QoS-class
271-
resource types in the future.
272-
- Make QoS-class resources opaque (as possible) to the CRI client
273-
- Discovery of the available QoS-class resources
274-
- Resource status/capacity
264+
- Make it possible to request QoS-class resources from the PodSpec
265+
- Container-level QoS-class resources
266+
- Pod-level (sandbox-level) QoS-class resources
267+
- Make it simple to implement new types QoS-class resource
268+
- Make QoS-class resources opaque (as possible) to Kubernetes
269+
- Support automatic discovery of the available QoS-class resources
270+
- Support per-node status/capacity of QoS-class resources
275271
- Access control ([future work](#future-work))
272+
- Support updating QoS-class resource assignment of running containers
273+
([future work](#in-place-pod-vertical-scaling))
276274

277275
### Non-Goals
278276

@@ -479,6 +477,13 @@ Use field name `Ceiling` instead `Capacity` in QOSResourceClassLimit.
479477
Not supporting Max (i.e. only supporting Default) in LimitRanges could simplify
480478
the API.
481479

480+
#### Cluster autoscaler
481+
482+
The cluster autoscaler support will be extended to support QoS-class resources.
483+
The behavior will be comparable to extended resources. The expectation would be
484+
that all nodes in a node group would have an identical set of QoS-class
485+
resources.
486+
482487
#### API objects for resources and classes
483488

484489
`<<[UNRESOLVED]>>`
@@ -585,7 +590,8 @@ spec:
585590
As a vendor I want to implement custom QoS controls as an extension of the
586591
container runtime. I want my QoS control to be visible in the cluster and
587592
integrated e.g. in the Kubernetes sheduler and not rely e.g. on Pod annotations
588-
to communicate QoS requests.
593+
to communicate QoS requests. I will implement my QoS-class resources as an
594+
[NRI API][nri-api] plugin.
589595

590596
#### Defaults and limits
591597

@@ -1458,19 +1464,17 @@ Container QoS resources:
14581464

14591465
### Container runtimes
14601466

1461-
Currently, there is support (container-level QoS-class resources) for Intel RDT
1462-
and blockio in CRI-O and containerd runtimes:
1463-
1464-
- cri-o:
1465-
- [~~Add support for Intel RDT~~](https://github.com/cri-o/cri-o/pull/4830)
1466-
- [~~Support for cgroups blockio~~](https://github.com/cri-o/cri-o/pull/4873)
1467-
- containerd:
1468-
- [~~Support Intel RDT~~](https://github.com/containerd/containerd/pull/5439)
1469-
- [~~Support for cgroups blockio~~](https://github.com/containerd/containerd/pull/5490)
1467+
There is support (container-level QoS-class resources) for Intel RDT
1468+
and blockio in CRI-O ([~~#4830~~](https://github.com/cri-o/cri-o/pull/4830),
1469+
[~~#4873~~](https://github.com/cri-o/cri-o/pull/4873)) and containerd
1470+
([~~#5439~~](https://github.com/containerd/containerd/pull/5439),
1471+
[~~#5490~~](https://github.com/containerd/containerd/pull/5490)) runtimes.
1472+
The current user interface is provided through pod and container annotations.
1473+
The plan is to start using QoS-class resources instead of annotations.
14701474

1471-
The design paradigm here is that the container runtime configures the QoS-class
1472-
resources according to a given configuration file. Enforcement on containers is
1473-
done via OCI. User interface is provided through pod and container annotations.
1475+
The plan is also to extend the [NRI API][nri-api]
1476+
(Node Resource Interface) to support QoS-class resources, allowing for example
1477+
the implementation of new types of QoS-class resources as NRI plugins.
14741478

14751479
Container runtimes will be updated to support the
14761480
[CRI API extensions](#cri-api)
@@ -2347,3 +2351,4 @@ required.
23472351
[oci-runtime-rdt]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config-linux.md#IntelRdt
23482352
[pod-qos-class]: https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
23492353
[dra-kep]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3063-dynamic-resource-allocation
2354+
[nri-api]: https://github.com/containerd/nri

keps/sig-node/3008-qos-class-resources/kep.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ stage: alpha
1717
# The most recent milestone for which work toward delivery of this KEP has been
1818
# done. This can be the current (upcoming) milestone, if it is being actively
1919
# worked on.
20-
latest-milestone: "v1.29"
20+
latest-milestone: "v1.30"
2121

2222
# The milestone at which this feature was, or is targeted to be, at each stage.
2323
milestone:
24-
alpha: "v1.29"
24+
alpha: "v1.30"
2525

2626
# The following PRR answers are required at alpha release
2727
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)