-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3751: Update release signoff Checklist before GA #5024
base: master
Are you sure you want to change the base?
Conversation
|
It looks like this PR does not contain this merged change: https://github.com/kubernetes/enhancements/pull/5028/files |
/retitle KEP-3751: Update release signoff Checklist before GA |
f1bfa08
to
4b306b5
Compare
/assign johnbelamaric |
4b306b5
to
37fdc1a
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sunnylovestiramisu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
37fdc1a
to
17629e7
Compare
Stress tests still WIP: kubernetes/kubernetes#129918 It is set at a low rate though similar to our other stress tests(10 pods), and creating pod+volume, and then modify the volume. |
17629e7
to
e6cbc70
Compare
The pod and volume are both up and running. | ||
|
||
4. Turn on both feature flag and beta API in api-server again. ``kubectl get vac`` shows the VACs again. Change PVC back to VAC1, modify is applied. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, everything looks pretty good. The only thing I'd like to see is some updates to the sections below, for example:
-
SLOs: Did beta confirm the stated SLOs are appropriate?
-
For this one:
Yes. The VAC protection controller will be expensive because it needs to LIST PVCs/PVs but the call volume
should be low.Can you explain a bit further? I think when you say "volume should be low" what you mean is "volume is user-initiated, not triggered by any upstream controller, and is expected to be low in the typical use case".
-
Was this done? Can you include / point to any results?
Yes, the feature may impact CreateVolume. We will measure this impact during beta and provide feedback to operators.
-
Was this done? Can you include / point to any results?
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes it is still appropriate and it is used by AWS's driver in production workloads right now
- Your understanding is correct, volume modify is triggered by the user changing a PVC's VAC, and most CO's has a rate limit(you can modify once per 4 hours etc.). So we are not expecting high call rate
- This one needs an update, because not many customers in AWS creating volume with VAC. But that being said, there is no impact to existing CreateVolume if VAC is not set.
- Please see the other comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated these in the actual KEP files as well
e6cbc70
to
e271c70
Compare
@@ -953,7 +969,7 @@ Using this feature may result in non-negligible increase of resource usage IF cu | |||
- external-resizer CPU and memory will see a non-negligible increase if users increased the number of concurrent operations via the `--workers` flag. We follow the strategy of sharing that limit between `ControllerExpandVolume` and `ControllerModifyVolume` RPCs, similar to how external-provisioner functions. | |||
- The API-Server may see a spike of CPU when processing relevant changes. | |||
|
|||
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications. | |||
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications. Before promoting to Beta in 1.29, 250 modifications at a rate of 4 patches per second was tested on AWS and the bottle-neck is AWS limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link for this stress test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No link, it is their internal tests. I have asked Drew to make a comment here with their numbers, they still need to test CreateVolume with VAC at a much higher rate than the stress tests added in the upstream PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not comfortable with putting this existing language about tests done on AWS before promoting to beta into the KEP itself.
AWS EBS did not perform official K8s stress tests when helping promote this KEP from alpha to Beta (because stress tests were required for GA, not beta).
I did perform a spot check of modifying 250 volumes concurrently with EBS CSI Driver before EKS flipped VAC feature gate on by default in EKS 1.31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And ideally these stress tests would be performed with CSI Mock driver, so cloud provider and CSI Driver implementations do not affect test results, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I just performed the following quick test:
Patched 250 PVCs with a new volumeAttributesClassName at 5 patches per second. Took 100 seconds total for all modifications to finish, bottle-neck being the slow patch rate due to AWS modification limits
EKS 1.32 (VolumeAttributesClass feature gate & v1beta1 storage API group enabled)
external-resizer:v1.12.0
AWS EBS CSI Driver v1.39.0
Resizer container peaked at 30 milli-vCPU and 34 MiB RAM (Negligible usage)
Here's a gist with commands and manifests: https://gist.github.com/AndrewSirenko/4ab8ec5a1eee51674b5d3ac18835f4e9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's a WIP stress test mentioned here: #5024 (comment)
e271c70
to
478076c
Compare
* E2E tests using mock driver to cause failure on create, update and recovering cases | ||
* [K8s storage framework](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/testsuites) | ||
* [csi-tes](https://github.com/kubernetes-csi/csi-test) | ||
* E2E tests: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/volumeattributesclass.go | ||
* Test coverage of quota usage with ResourceQuota and LimitRange | ||
* Measure latency impact to CreateVolume during beta and provide feedback to operators | ||
* Upgrade and rollback test when the feature gate changes to beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has anyone used/tested quota features of this KEP before we move the whole KEP to GA? This seems like a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest quota change is this: kubernetes/kubernetes#124360
But it did not merged in yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. But it seems weird to merge that code directly with GA feature-gate when it was never tested/merged before.
863f877
to
0bfdef2
Compare
0bfdef2
to
61c778e
Compare
Ok, I see the stress test is in progress, I would expect that do be done as part of this GA. The other PRR answers seem good. Once there is SIG approval, I can add the approval. |
/assign @msau42