|
7 | 7 |
|
8 | 8 | The Kubernetes control plane consists of the Kubernetes API Server, Kubernetes Controller Manager, Scheduler and other components that are required for Kubernetes to function. Scalability limits of these components are different depending on what you're running in the cluster, but the areas with the biggest impact to scaling include the Kubernetes version, utilization, and individual Node scaling. |
9 | 9 |
|
10 | | -== Use EKS 1.24 or above |
11 | | - |
12 | | -EKS 1.24 introduced a number of changes and switches the container runtime to https://containerd.io/[containerd] instead of docker. Containerd helps clusters scale by increasing individual node performance by limiting container runtime features to closely align with Kubernetes`' needs. Containerd is available in every supported version of EKS and if you would like to switch to containerd in versions prior to 1.24 please use the https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#containerd-bootstrap[`--container-runtime` bootstrap flag]. |
13 | | - |
14 | 10 | == Limit workload and node bursting |
15 | 11 |
|
16 | 12 | [IMPORTANT] |
@@ -115,16 +111,17 @@ To protect itself from being overloaded during periods of increased requests, th |
115 | 111 |
|
116 | 112 | The mechanism used by Kubernetes to configure how these inflights requests are divided among different request types is called https://kubernetes.io/docs/concepts/cluster-administration/flow-control/[API Priority and Fairness]. The API Server configures the total number of inflight requests it can accept by summing together the values specified by the `--max-requests-inflight` and `--max-mutating-requests-inflight` flags. EKS uses the default values of 400 and 200 requests for these flags, allowing a total of 600 requests to be dispatched at a given time. However, as it scales the control-plane to larger sizes in response to increased utilization and workload churn, it correspondingly increases the inflight request quota all the way till 2000 (subject to change). APF specifies how these inflight request quota is further sub-divided among different request types. Note that EKS control planes are highly available with at least 2 API Servers registered to each cluster. This means the total number of inflight requests your cluster can handle is twice (or higher if horizontally scaled out further) the inflight quota set per kube-apiserver. This amounts to several thousands of requests/second on the largest EKS clusters. |
117 | 113 |
|
118 | | -Two kinds of Kubernetes objects, called PriorityLevelConfigurations and FlowSchemas, configure how the total number of requests is divided between different request types. These objects are maintained by the API Server automatically and EKS uses the default configuration of these objects for the given Kubernetes minor version. PriorityLevelConfigurations represent a fraction of the total number of allowed requests. For example, the workload-high PriorityLevelConfiguration is allocated 98 out of the total of 600 requests. The sum of requests allocated to all PriorityLevelConfigurations will equal 600 (or slightly above 600 because the API Server will round up if a given level is granted a fraction of a request). To check the PriorityLevelConfigurations in your cluster and the number of requests allocated to each, you can run the following command. These are the defaults on EKS 1.24: |
| 114 | +Two kinds of Kubernetes objects, called PriorityLevelConfigurations and FlowSchemas, configure how the total number of requests is divided between different request types. These objects are maintained by the API Server automatically and EKS uses the default configuration of these objects for the given Kubernetes minor version. PriorityLevelConfigurations represent a fraction of the total number of allowed requests. For example, the workload-high PriorityLevelConfiguration is allocated 98 out of the total of 600 requests. The sum of requests allocated to all PriorityLevelConfigurations will equal 600 (or slightly above 600 because the API Server will round up if a given level is granted a fraction of a request). To check the PriorityLevelConfigurations in your cluster and the number of requests allocated to each, you can run the following command. These are the defaults on EKS 1.32: |
119 | 115 |
|
120 | | - $ kubectl get --raw /metrics | grep apiserver_flowcontrol_request_concurrency_limit |
121 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="catch-all"} 13 |
122 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="global-default"} 49 |
123 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="leader-election"} 25 |
124 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="node-high"} 98 |
125 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="system"} 74 |
126 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="workload-high"} 98 |
127 | | - apiserver_flowcontrol_request_concurrency_limit{priority_level="workload-low"} 245 |
| 116 | + $ kubectl get --raw /metrics | grep apiserver_flowcontrol_nominal_limit_seats |
| 117 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="catch-all"} 13 |
| 118 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="exempt"} 0 |
| 119 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="global-default"} 49 |
| 120 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="leader-election"} 25 |
| 121 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="node-high"} 98 |
| 122 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="system"} 74 |
| 123 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="workload-high"} 98 |
| 124 | + apiserver_flowcontrol_nominal_limit_seats{priority_level="workload-low"} 245 |
128 | 125 |
|
129 | 126 | The second type of object are FlowSchemas. API Server requests with a given set of properties are classified under the same FlowSchema. These properties include either the authenticated user or attributes of the request, such as the API group, namespace, or resource. A FlowSchema also specifies which PriorityLevelConfiguration this type of request should map to. The two objects together say, "I want this type of request to count towards this share of inflight requests." When a request hits the API Server, it will check each of its FlowSchemas until it finds one that matches all the required properties. If multiple FlowSchemas match a request, the API Server will choose the FlowSchema with the smallest matching precedence which is specified as a property in the object. |
130 | 127 |
|
@@ -164,11 +161,11 @@ apiserver_flowcontrol_rejected_requests_total{flow_schema="service-accounts",pri |
164 | 161 | To check how close a given PriorityLevelConfiguration is to receiving 429s or experiencing increased latency due to queuing, you can compare the difference between the concurrency limit and the concurrency in use. In this example, we have a buffer of 100 requests. |
165 | 162 |
|
166 | 163 | ---- |
167 | | -% kubectl get --raw /metrics | grep 'apiserver_flowcontrol_request_concurrency_limit.*workload-low' |
168 | | -apiserver_flowcontrol_request_concurrency_limit{priority_level="workload-low"} 245 |
| 164 | +% kubectl get --raw /metrics | grep 'apiserver_flowcontrol_nominal_limit_seats.*workload-low' |
| 165 | +apiserver_flowcontrol_nominal_limit_seats{priority_level="workload-low"} 245 |
169 | 166 |
|
170 | | -% kubectl get --raw /metrics | grep 'apiserver_flowcontrol_request_concurrency_in_use.*workload-low' |
171 | | -apiserver_flowcontrol_request_concurrency_in_use{flow_schema="service-accounts",priority_level="workload-low"} 145 |
| 167 | +% kubectl get --raw /metrics | grep 'apiserver_flowcontrol_current_executing_seats.*workload-low' |
| 168 | +apiserver_flowcontrol_current_executing_seats{flow_schema="service-accounts",priority_level="workload-low"} 145 |
172 | 169 | ---- |
173 | 170 |
|
174 | 171 | To check if a given PriorityLevelConfiguration is experiencing queuing but not necessarily dropped requests, the metric for `apiserver_flowcontrol_current_inqueue_requests` can be referenced: |
@@ -219,14 +216,14 @@ Alternatively, new FlowSchema and PriorityLevelConfigurations objects can be cre |
219 | 216 | When making changes to APF defaults, these metrics should be monitored on a non-production cluster to ensure changing the settings do not cause unintended 429s: |
220 | 217 |
|
221 | 218 | . The metric for `apiserver_flowcontrol_rejected_requests_total` should be monitored for all FlowSchemas to ensure that no buckets start to drop requests. |
222 | | -. The values for `apiserver_flowcontrol_request_concurrency_limit` and `apiserver_flowcontrol_request_concurrency_in_use` should be compared to ensure that the concurrency in use is not at risk for breaching the limit for that priority level. |
| 219 | +. The values for `apiserver_flowcontrol_nominal_limit_seats` and `apiserver_flowcontrol_current_executing_seats` should be compared to ensure that the concurrency in use is not at risk for breaching the limit for that priority level. |
223 | 220 |
|
224 | 221 | One common use-case for defining a new FlowSchema and PriorityLevelConfiguration is for isolation. Suppose we want to isolate long-running list event calls from pods to their own share of requests. This will prevent important requests from pods using the existing service-accounts FlowSchema from receiving 429s and being starved of request capacity. Recall that the total number of inflight requests is finite, however, this example shows APF settings can be modified to better divide request capacity for the given workload: |
225 | 222 |
|
226 | 223 | Example FlowSchema object to isolate list event requests: |
227 | 224 |
|
228 | 225 | ---- |
229 | | -apiVersion: flowcontrol.apiserver.k8s.io/v1beta1 |
| 226 | +apiVersion: flowcontrol.apiserver.k8s.io/v1 |
230 | 227 | kind: FlowSchema |
231 | 228 | metadata: |
232 | 229 | name: list-events-default-service-accounts |
|
0 commit comments