Skip to content

Commit aff621c

Browse files
KEP-2862: Graduate to BETA.
1 parent 62039f1 commit aff621c

File tree

3 files changed

+82
-4
lines changed

3 files changed

+82
-4
lines changed

keps/prod-readiness/sig-node/2862.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 2862
55
alpha:
66
approver: "@jpbetz"
7+
beta:
8+
approver: "@jpbetz"

keps/sig-node/2862-fine-grained-kubelet-authz/README.md

+77-2
Original file line numberDiff line numberDiff line change
@@ -784,13 +784,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
784784
will rollout across nodes.
785785
-->
786786

787+
We have designed a fallback mechanism that prevents from failed rollouts or rollbacks
788+
from impacting an already running workloads ability to interact with the kubelet API.
789+
790+
Please see the [Design Details](#design-details) section for more information.
791+
787792
###### What specific metrics should inform a rollback?
788793

789794
<!--
790795
What signals should users be paying attention to when the feature is young
791796
that might indicate a serious problem?
792797
-->
793798

799+
Increase in failed requests to kubelet API from workloads.
800+
794801
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
795802

796803
<!--
@@ -799,11 +806,28 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
799806
are missing a bunch of machinery and tooling and can't do that now.
800807
-->
801808

809+
We have tested the following upgrade scenarios manually:
810+
811+
|Scenario| Result |
812+
| -------|--------|
813+
| Upgrade both kubelet and kube-apiserver so that feature gate is enabled in both. | workloads and kube-apiserver are able to reach kubelet|
814+
| Upgrade only kubelet to enable the feature-gate | workloads and kube-apiserver are able to reach kubelet |
815+
| Updrade only kube-apiserver to enable the feature-gate | workloads and kube-apiserver are able to reach kubelet |
816+
817+
We have tested the following rollback scenarios manually:
818+
819+
|Scenario| Result |
820+
| -------|--------|
821+
| Rollback both kubelet and kube-apiserver so that feature gate is disabled in both. | workloads and kube-apiserver are able to reach kubelet|
822+
| Rollback only kubelet to disable the feature-gate | workloads and kube-apiserver are able to reach kubelet |
823+
| Rollback only kube-apiserver to disable the feature-gate | workloads and kube-apiserver are able to reach kubelet |
824+
802825
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
803826

804827
<!--
805828
Even if applying deprecation policies, they may still surprise some users.
806829
-->
830+
No.
807831

808832
### Monitoring Requirements
809833

@@ -822,6 +846,28 @@ checking if there are objects with field X set) may be a last resort. Avoid
822846
logs or events for this purpose.
823847
-->
824848

849+
Users can check if this feature is enabled in kube-apiserver by running the
850+
following command:
851+
852+
```sh
853+
kubectl get --raw /metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz
854+
```
855+
856+
Users can check if this feature is nabled in the kubelet by running the
857+
following command in a pod that is running on the node:
858+
859+
If readonly port is enabled:
860+
```sh
861+
curl http://<node-ip>:10255/metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz
862+
```
863+
864+
If readonly port is not enabled:
865+
```sh
866+
curl -k https://$MY_NODE_IP:10250/metrics | grep kubernetes_feature_enabled | grep KubeletFineGrainedAuthz
867+
```
868+
869+
NOTE: for port 10250 the pod will need to have the right RBAC bindings (if RBAC is enabled) to view the metrics.
870+
825871
###### How can someone using this feature know that it is working for their instance?
826872

827873
<!--
@@ -838,8 +884,8 @@ Recall that end users cannot usually observe component logs or access metrics.
838884
- [ ] API .status
839885
- Condition name:
840886
- Other field:
841-
- [ ] Other (treat as last resort)
842-
- Details:
887+
- [x] Other (treat as last resort)
888+
- Details: By replacing `nodes/proxy` permission in RBAC with the fine-grained permissions required by the workload such as `nodes/metrics`, `nodes/pods` etc. and then confirming that the requests to kubelet succeed and don't encounter authorization errors.
843889

844890
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
845891

@@ -858,6 +904,8 @@ These goals will help you determine what you need to measure (SLIs) in the next
858904
question.
859905
-->
860906

907+
Same SLOs as the kubelet API currently offers.
908+
861909
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
862910

863911
<!--
@@ -871,13 +919,17 @@ Pick one more of these and delete the rest.
871919
- [ ] Other (treat as last resort)
872920
- Details:
873921

922+
Same SLIs as the kubelet API currenlty offers.
923+
874924
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
875925

876926
<!--
877927
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
878928
implementation difficulties, etc.).
879929
-->
880930

931+
No.
932+
881933
### Dependencies
882934

883935
<!--
@@ -901,6 +953,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
901953
- Impact of its degraded performance or high-error rates on the feature:
902954
-->
903955

956+
This feature only comes into play if kubelet authotization mode is set to Webhook.
957+
904958
### Scalability
905959

906960
<!--
@@ -1024,6 +1078,9 @@ details). For now, we leave it here.
10241078

10251079
###### How does this feature react if the API server and/or etcd is unavailable?
10261080

1081+
Not any diferent from how it would affect kubelet without this feature. If kube-apiserver
1082+
is unavailable any SAR from kubelet will fail.
1083+
10271084
###### What are other known failure modes?
10281085

10291086
<!--
@@ -1039,8 +1096,22 @@ For each of them, fill in the following information by copying the below templat
10391096
- Testing: Are there any tests for failure mode? If not, describe why.
10401097
-->
10411098

1099+
If requests to kubelet API start failing due to authorization issues users can
1100+
disabled the feature-gate.
1101+
1102+
Users can check the kubernetes Audit logs for SubjectAccessReview requests
1103+
created by `system:nodes:*` and check the reason they failed.
1104+
10421105
###### What steps should be taken if SLOs are not being met to determine the problem?
10431106

1107+
1. Check that the feature gate is enabled in kube-apiserver and kubelet.
1108+
2. Check that the workload has the right permissions. Requesets are expected to
1109+
fail if you are using fine-grained subresources but the feature gate is not enabled
1110+
in kubelet.
1111+
3. Check the audit logs for SubjectAccessReview requests created by `system:nodes:*`
1112+
and check the reason these requests failed.
1113+
4. Check kubelet logs.
1114+
10441115
## Implementation History
10451116

10461117
<!--
@@ -1054,6 +1125,10 @@ Major milestones might include:
10541125
- when the KEP was retired or superseded
10551126
-->
10561127

1128+
2024-09-28: [KEP-2862](https://github.com/kubernetes/enhancements/pull/4760) merged as implementable and PRR approved for ALPHA.
1129+
2024-10-17: Alpha Code implementation [PR](https://github.com/kubernetes/kubernetes/pull/126347) merged.
1130+
2024-10-22: Alpha Documentation [PR](https://github.com/kubernetes/website/pull/48412) merged.
1131+
10571132
## Drawbacks
10581133

10591134
<!--

keps/sig-node/2862-fine-grained-kubelet-authz/kep.yaml

+3-2
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,17 @@ see-also:
1919
replaces:
2020

2121
# The target maturity stage in the current dev cycle for this KEP.
22-
stage: alpha
22+
stage: beta
2323

2424
# The most recent milestone for which work toward delivery of this KEP has been
2525
# done. This can be the current (upcoming) milestone, if it is being actively
2626
# worked on.
27-
latest-milestone: "v1.32"
27+
latest-milestone: "v1.33"
2828

2929
# The milestone at which this feature was, or is targeted to be, at each stage.
3030
milestone:
3131
alpha: "v1.32"
32+
beta: "v1.33"
3233

3334
# The following PRR answers are required at alpha release
3435
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)