-
Notifications
You must be signed in to change notification settings - Fork 191
H4HIP: Wait with kstatus #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the HIP! I have been wanting to write this one myself for some time. I agree, kstatus
is where the Kubernetes community has put significant effort into thinking about Kubernetes resource "readiness". And Helm would do well to reuse this effort.
I have put some comments. They are mostly centered around what noticable (if any) behaviors users would notice from the existing mechanism. And how to mitigate/manage those.
|
||
<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. --> | ||
|
||
## Backwards compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any situation where kstatus
will not return ready, but existing logic would?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect, to be on the safe side, we will want/need to support a fallback flag to allow users to fallback to the "legacy" wait/ready code. Ideally we would deprecate that flag e.g. Helm 5 and remove the old code in a future version. Not sure what we would do if the legacy code serves some behavior that kstatus doesn't (ideally this doesn't happen of course), but I think we should allow users the fallback when upgrading to Helm 4 just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kstatus seems to be pretty extensible, we can implement a custom status reader and override the behavior for any resource type. I don't think there will have any long term issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the HIP. I like the idea of using something from the Kubernetes community to know the status. When Helm's current code was built, nothing like this was available.
Signed-off-by: Austin Abro <[email protected]>
Thank you guys for the feedback, I am aiming to create a draft PR sometime next week so we can get a sense for what it will look like. |
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
@mattfarina @gjenkins8 I created a draft PR with my implementation and updated this proposal with some of the finer details. LMK what feedback / questions you have Draft PR - helm/helm#13604 |
hips/hip-0999.md
Outdated
} | ||
``` | ||
|
||
`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak for helm maintainers, but it looks like this method is part of their public API, and not deprecated, so I'd be careful with removing it right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe since this is targeted at Helm v4 breaking changes in the public API are acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to get rid of this immediately: helm/helm#13665
hips/hip-0999.md
Outdated
|
||
`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed. | ||
|
||
`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again similar comment, I'd be careful breaking API. Either a deprecation or just wire the method to invoke the same underlying code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto with Helm 4 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why hooks have different logic. Hooks are typically jobs/pods, so perhaps the logic is very simple ie. check pod has exited?, would need to go look.
That said, I have heard of folk using hooks for e.g. cluster pre-flight custom resources (@z4ce I think?)
If we could analyze and/or test whether hooks can use kstatus, that would be good. Same "legacy" fallback comments apply.
Worth mentioning too, the decision we make will remain in place for the lifetime of Helm 4 (ie. we won't change the default once Helm 4 is released)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at the hook logic and it waits for pods and jobs to be completed rather than just ready. I believe this can be done with kstatus, and I will verify later this week. I took a look at how Flux does it and it seems like we could implement the same logic with kstatus, will just involve custom code. Assuming my assumptions are correct, we will keep the WatchUntilReady
function and move it into the Waiter
interface.
Flux Job wait implementation - https://github.com/fluxcd/kustomize-controller/blob/main/internal/statusreaders/job.go
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
hips/hip-0999.md
Outdated
|
||
`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share. | ||
|
||
The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can used the existing --wait flag e.g. --wait=true|false|none|kstatus|legacy
(true & false are placeholders for kstatus and none respectively, to be compatible with existing users who might have --wait=true
, if we decide this is warranted for Helm 4. else we just error those users)
Up for suggestions on whether we use the name "kstatus" in the flag. Or whether we consider this an "implementation detail". I suspect we will want to refer users to kstatus docs generally for behavioral reference, so it is okay to refer to kstatus directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only hiccup I see with structuring the --wait
flag like this is that Helm hooks always wait regardless of if the wait flag is set. So a user wouldn't be able to not wait, and use the old Helm hook logic. Still, based on my other comment, we will end up doing a custom implementation anyway so we should be able to create relatively similar logic.
Other option would be introducing --wait-strategy
and --hook-wait-strategy
flags. IMO it's better to avoid the user complexity, but I'll leave the call to the maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about --wait=false
? That should handle not waiting, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only hiccup I see with structuring the
--wait
flag like this is that Helm hooks always wait regardless of if the wait flag is set. So a user wouldn't be able to not wait, and use the old Helm hook logic. Still, based on my other comment, we will end up doing a custom implementation anyway so we should be able to create relatively similar logic.
Given the simplicity of hook wait logic, perhaps it is possible to directly determine whether kstatus will be suitable?
For the main wait logic, I think a fallback to legacy is needed as Helm's custom logic is fairly complex. And there is probably some scenario out there which kstatus and Helm's custom logic are not equivalent.
For the relatively hook wait logic, can we be more confident kstatus is a replacement?
Other option would be introducing
--wait-strategy
and--hook-wait-strategy
flags. IMO it's better to avoid the user complexity, but I'll leave the call to the maintainers.
It would be nice to avoid adding more options to the Helm CLI (unless really needed) IMHO. Every CLI option adds to the complexity of the interface Helm presents to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the relatively hook wait logic, can we be more confident kstatus is a replacement?
Now that we've added customstatusreaders, IMO we can be much more confident that kstatus is a drop in replacement. I've basically copied the logic over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about --wait
vs --wait-strategy
a bit more. While more CLI flags do add more complexity, since it's likely that the new watcher will work in the vast majority of cases, I think it would be simpler for most users to only have to use --wait
, rather than them have to figure out which type of waiter they want to use.
Additionally since --wait
is currently a bool flag, a compatibility shim for --wait=true
likely wouldn't work for most cases as users will usually run helm install --wait
without specifying true / false.
LMK what you think, about using --wait
and --wait-strategy
. When only --wait
is specified the new watcher will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch from plain bool to a string value isn't that hard, we've done something like that transparently to users in kubernetes/kubernetes#87580.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, thanks for the example. I updated the doc to reflect a --wait
flag that accepts true|false|none|watcher|legacy
hips/hip-0999.md
Outdated
|
||
The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released. | ||
|
||
The current Helm wait logic does not wait for paused deployments to be ready. This is because if `helm upgrade --wait` is run on a chart with paused deployments they will never be ready, see [#5789](https://github.com/helm/helm/pull/5789). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this seems more like motivation that specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not changing this in the PR, but I wanted to give the reference to the PR that explains why we wait for paused deployments. Let me make this line more clear
|
||
Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test. | ||
```yaml | ||
rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is extensive enough. I suspect kstatus will need watch and list for every resource type in a chart (not just apps)
In particular, Helm only considered a few resources types previously.
This is fine, just want the HIP to be accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. A few comments/cleanups, but I think the HIP is close to being ready
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
@gjenkins8 Finally was able to get back to this and address all the comments. My PR is ready to review as well - helm/helm#13604. I just need to add the cobra logic for |
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your HIP. I am OK with what is being proposed here. You handle the legacy way this should work and a path going forward. I believe this HIP is fine for Helm v4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me.
proposal to replace the current wait logic in Helm with kstatus