-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4650: StatefulSet Support for Updating Volume Claim Template #4651
base: master
Are you sure you want to change the base?
Conversation
huww98
commented
May 22, 2024
•
edited
Loading
edited
- One-line PR description: initial proposal of KEP-4650: StatefulSet Support for Updating Volume Claim Template
- Issue link: StatefulSet Support for Updating Volume Claim Template #4650
- Other comments: Inspired by and should replace KEP-0661: StatefulSet volume resize #3412
This KEP is inspired by the previous proposal in #3412 . However, there are several major differences, so I decided to make a new KEP for this. Differences:
Please read more in the "Alternatives" section in the KEP. This KEP also contains more details about how to coordinate the update of Pod and PVC, which is another main concern of the previous attempt. |
owning-sig: sig-storage | ||
participating-sigs: | ||
- sig-app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main issues with #3412 were around StatefulSet controller and its behavior in error cases. IMO it should be owned by sig-apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, given that the changes to code should mainly happen on StatefulSet controller, and the previous attempts all put sig-apps as the owning-sig (it is weird that they are placed in the sig-storage folder)
/sig apps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing production readiness questionnaire filled in and a PRR file in https://github.com/kubernetes/enhancements/tree/master/keps/prod-readiness/sig-apps matching the template
keps/sig-apps/4650-stateful-set-update-claim-template/README.md
Outdated
Show resolved
Hide resolved
keps/sig-apps/4650-stateful-set-update-claim-template/README.md
Outdated
Show resolved
Hide resolved
know that this has succeeded? | ||
--> | ||
* Allow users to update the `volumeClaimTemplates` of a `StatefulSet` in place. | ||
* Automatically update the associated PersistentVolumeClaim objects in-place if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by in-place
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the resizedisk feature can automatically adjust the disk and filesystem sizes while the pod is running. In addition to this there will be new features like VolumeAttributesClass that will also support in-place changes to the storage being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the word in-place to explicitly distinguish from Delete/Create style update for Pods. We should never delete PVC automatically.
Production Readiness review, etc.
Updated "Production Readiness Review Questionnaire" as requested. During the last sig-apps meeting, First, this KEP is still mainly targeting the use-cases where we can update the PVC in-place, without interrupting the running pods. The migration and other use-cases are just by-product of enabling editing the volumeClaimTemplates. But still, migration by editing sts is simpler, just requiring a rolling restart, which should already be familiar by most k8s operators. Editing the sts does not require switching traffic between 2 sts, for example. And the name of sts won't change after migration. Yes, we cannot rollback easily with this procedure. The user should delete the PVC again to rollback. And the data in the PVC may not be recovered if retention policy is Q2: @soltysh suggested we may still go along the way of KEP-0661, then do more after we got more experience. Here is why I don't want to proceed that way:
Please read more in the @soltysh said we don't want the STS to be stuck in a permanently broken state. Of course. With this KEP, as we are not validating the templates. It is actually very hard to get stuck. Larger PVC is compatible with smaller template, so just rolling back the template should unblock us from future rollout, leaving PVCs in the expanding state, or try to cancel the expansion if I think the only way we may get stuck is patching VAC, if one replica is successfully updated to the new VAC, but another replica failed. and rolling back the first replica to the old VAC also failed. Even in this case, the user can just set Q3: @kow3ns thinks it is not appropriate to delete and recreate the PVC to alter the performance characteristics of volumes. VAC is the KEP that actually parameterizing the storage class and allow us to specify and update the performance characteristics of volumes without interrupting the running pod, by patching the existing PVC. So this KEP should also integrate with VAC. The update to VAC in the volumeClaimTemplates should not require re-creating the PVC, and is fully automated if everything goes well. Q4: @kow3ns asked how we should handle each field of This is described in the KEP in "How to update PVCs" section in "Updated Reconciliation Logic". Basically, patch what we can, skip the rest. It seems the wording "update PVC in-place" causes many mis-understandings. I will replace it with "patch PVC". We didn’t actually decide anything during the last meeting. I think these core questions should be decided to push this KEP forward:
|
The general sentiment from that sig-apps call (see https://docs.google.com/document/d/1LZLBGW2wRDwAfdBNHJjFfk9CFoyZPcIYGWU7R1PQ3ng/edit#heading=h.2utc2e8dj14) was that the smaller changes have a greater chances to move forward. Also, it's worth noting that the smaller changes do not stand in opposition to the changes proposed in here, they are only taking the gradual approach by focusing on a minimal subset of changes. |
Okay, we agreed that focusing on the minimal subset of changes. @huww98 and @vie-serendipity will proceed with only
|
At the last sig-apps meeting, we have decided that we should:
But for the validation of the template, I think we still need more discussion. It can be a major blocking point of this KEP. @soltysh think that we should not allow decreasing the size of template. He thinks we can remove the validation later if desired. But I think validation has many drawbacks which may block normal usage of this feature and should be resolved in the initial version:
On the contrast, if we just don't add the validation, we can avoid all these issues, and lose nothing: The user can expand PVC independently today. So, the state that the template is smaller than PVC is already very common and stable. The strategy in this state is not even trying to shrink the PVC. I think this is well defined and easy to follow. If Kubernetes ever supports shrinking in the future, we will still need to support drivers that can't shrink. So, even then we can only support shrinking with a new To take one step back, I think validating the template across resources violates the high-level design. The template describes a desired final state, rather than an immediate instruction. A lot of things can happen externally after we update the template. For example, I have an IaaS platform, which tries to To conclude, I don't want to add the validation, we don't add it just to be removed in the future. |
Agree. By the way, |
That's one way looking at it, also in those cases where a mistake happens (I consider that a rare occurrence), you can always use #3335 and migrate to a new, smaller StatefulSet. |
@huww98 and @liubog2008 are you planning to push this through for 1.32? |
Users have been waiting for many years to be able to scale up statefulset volumes. I agree we shouldn't over complicate that use case trying to handle solving other issues at the same time. Lets focus on the very common use case, and then can reevaluate other features after that is completed. |
@soltysh Enhancement Freeze date is at Oct 11th right? We are on vacation right now and may not make this date. cc @huww98 @vie-serendipity |
Roger that, let's try to work on the enhancement still within 1.32 time frame, so that it's ready in time for 1.33, in that case. |
okay, that sounds great~ |
We propose to patch the PVC as a whole, so it can only succeed if the immutable fields matches. | ||
|
||
If only expansion is supported, patching regardless of the immutable fields can be a logical choice. | ||
But this KEP also integrates with VAC. VAC is closely coupled with storage class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full name of VAC first time the acronym is used - I had to infer what it was from the comments (https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/3751-volume-attributes-class/README.md)
|
||
Introduce a new field in StatefulSet `spec`: `volumeClaimUpdatePolicy` to | ||
specify how to coordinate the update of PVCs and Pods. Possible values are: | ||
- `OnDelete`: the default value, only update the PVC when the the old PVC is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have some value in calling this OnPVCDelete
to clarify it is only updated if PVC is deleted, especially if there could be other deletion-based alternatives like OnPodDelete
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're talking twice about almost the same thing. Maybe just present this as:
## Proposal
### Kubernetes API Changes
<here describe the fields and specific changes>
### Kubernetes Controller Changes
<here describe the controller changes>
This will make it more reasonable to follow.
### What PVC is compatible | ||
|
||
A PVC is compatible with the template if: | ||
- All the immutable fields match exactly; and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(including storage class) matches (inferring from prior discussion that storage class changes will be left to the operator to manually manage)
- "@xing-yang" | ||
- "@soltysh" | ||
approvers: | ||
- "@kow3ns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "@kow3ns" | |
- "@soltysh" |
# The most recent milestone for which work toward delivery of this KEP has been | ||
# done. This can be the current (upcoming) milestone, if it is being actively | ||
# worked on. | ||
latest-milestone: "v1.31" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest-milestone: "v1.31" | |
latest-milestone: "v1.33" |
|
||
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
alpha: "v1.32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpha: "v1.32" | |
alpha: "v1.33" |
- [ ] (R) Production readiness review approved | ||
- [ ] "Implementation History" section is up-to-date for milestone | ||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: make sure to check appropriate boxes above.
This enhancement proposes to support modifications to the `volumeClaimTemplates`, | ||
automatically patching the associated PersistentVolumeClaim objects if applicable. | ||
Currently, PVC `spec.resources.requests.storage`, `spec.volumeAttributesClassName`, `metadata.labels`, and `metadata.annotations` | ||
can be patched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should clearly express the goal of this proposal, something like:
This enhancement proposes relaxing validation of StatefulSet's VolumeClaim template.
Specifically, we will:
- allow increasing the requested storage size (
spec.volumeClaimTemplates.spec.resources.requests.storage
); - modifying Volume AttributesClass used by the claim (
.spec.volumeClaimTemplates.spec.volumeAttributesClassName
); - modifying VolumeClaim template's labels (
spec.volumeClaimTemplates.metadata.labels
); - modifying VolumeClaim template's annotations (
spec.volumeClaimTemplates.metadata.annotations
).
Additionally, to support no. 1 we'll introduce a new field allowing users to express how volume claims should be updated (in-place vs OnDelete).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still wonder about that volumeClaimUpdatePolicy
, do we need it if we're not doing shrinking just yet?
|
||
###### What specific metrics should inform a rollback? | ||
|
||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd encourage to at least think through the metrics to allow proper monitoring of these operations proposed in this document.
|
||
### Dependencies | ||
|
||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect here details about relying on InPlace updates in PVC compatible drivers.
- Estimated amount of new objects: (e.g., new Object X for every existing Pod) | ||
--> | ||
StatefulSet: | ||
- `spec`: 2 new enum fields, ~10B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 2 new enum fields? volumeClaimUpdatePolicy
is just one field, isn't it?
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
||
## Implementation History |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be filled in.
--> | ||
No. | ||
|
||
### Troubleshooting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this feature might cause unexpected changes, I'd expect detailed troubleshooting explained here.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
and if `volumeClaimTemplates` and actual PVC only differ in mutable fields | ||
(`spec.resources.requests.storage`, `spec.volumeAttributesClassName`, `metadata.labels`, and `metadata.annotations` currently), | ||
patch the PVC to the extent possible. | ||
- `spec.resources.requests.storage` is patched to max(template spec, PVC status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I appreciate your work.
It says max(template spec, PVC status)
here, but isn't it max(template spec, PVC spec)
? If we use PVC status
, it might interrupt the user's PVC expansion as follows.
- Initial state: PVC template: 3G, PVC spec: 3G, PVC status 3G
- User updates PVC spec: PVC template: 3G, PVC spec: 5G, PVC status: 3G
- User updates PVC template before csi driver expand volume: PVC template: 4G, PVC spec: 5G, PVC status: 3G
- The PVC expansion in step 2 won't be reflected: PVC template: 4G, PVC spec: 4G, PVC status: 3G
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the acceptable behavior. I see your link to pvc-autoresizer. For a controller to work with this KEP, it should keep trying to reconcile the PVC spec. So, it should update PVC spec again to 5G after step 4. And STS controller will not interference again because it only updates PVC if the controller-revision-hash
label on the pod does not match.
For a human operator, he should monitor the PVC status to ensure his operation is successful anyway.
This is hard to avoid, I think. I would like to take advantage of KEP-1790 to help recover from expansion failure. So, I allow shrink the PVC spec (not tested yet, not sure how they work together).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. I understood it, and I think it will work well now. When the alpha version is released, I will check to see if there are any cases I have overlooked.