H4HIP: Wait with kstatus #374

AustinAbro321 · 2024-12-12T17:29:27Z

proposal to replace the current wait logic in Helm with kstatus

Signed-off-by: Austin Abro <[email protected]>

gjenkins8

Thanks for the HIP! I have been wanting to write this one myself for some time. I agree, kstatus is where the Kubernetes community has put significant effort into thinking about Kubernetes resource "readiness". And Helm would do well to reuse this effort.

I have put some comments. They are mostly centered around what noticable (if any) behaviors users would notice from the existing mechanism. And how to mitigate/manage those.

hips/hip-0999.md

gjenkins8 · 2024-12-14T01:37:06Z

hips/hip-0999.md

+
+<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth.  -->
+
+## Backwards compatibility


Is there any situation where kstatus will not return ready, but existing logic would?

Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure.

I suspect, to be on the safe side, we will want/need to support a fallback flag to allow users to fallback to the "legacy" wait/ready code. Ideally we would deprecate that flag e.g. Helm 5 and remove the old code in a future version. Not sure what we would do if the legacy code serves some behavior that kstatus doesn't (ideally this doesn't happen of course), but I think we should allow users the fallback when upgrading to Helm 4 just in case.

Kstatus seems to be pretty extensible, we can implement a custom status reader and override the behavior for any resource type. I don't think there will have any long term issues

hips/hip-0999.md

mattfarina

Thanks for the HIP. I like the idea of using something from the Kubernetes community to know the status. When Helm's current code was built, nothing like this was available.

hips/hip-0999.md

Signed-off-by: Austin Abro <[email protected]>

AustinAbro321 · 2024-12-14T18:37:55Z

Thank you guys for the feedback, I am aiming to create a draft PR sometime next week so we can get a sense for what it will look like.

Signed-off-by: Austin Abro <[email protected]>

AustinAbro321 · 2025-01-06T17:34:06Z

@mattfarina @gjenkins8 I created a draft PR with my implementation and updated this proposal with some of the finer details. LMK what feedback / questions you have

Draft PR - helm/helm#13604

hips/hip-0999.md

soltysh · 2025-01-08T12:23:00Z

hips/hip-0999.md

+}
+```
+
+`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed. 


I can't speak for helm maintainers, but it looks like this method is part of their public API, and not deprecated, so I'd be careful with removing it right away.

I believe since this is targeted at Helm v4 breaking changes in the public API are acceptable

I made a PR to get rid of this immediately: helm/helm#13665

soltysh · 2025-01-08T12:24:06Z

hips/hip-0999.md

+
+`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed. 
+
+`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share. 


Again similar comment, I'd be careful breaking API. Either a deprecation or just wire the method to invoke the same underlying code.

Ditto with Helm 4 comment

I'm not sure why hooks have different logic. Hooks are typically jobs/pods, so perhaps the logic is very simple ie. check pod has exited?, would need to go look.

That said, I have heard of folk using hooks for e.g. cluster pre-flight custom resources (@z4ce I think?)

If we could analyze and/or test whether hooks can use kstatus, that would be good. Same "legacy" fallback comments apply.

Worth mentioning too, the decision we make will remain in place for the lifetime of Helm 4 (ie. we won't change the default once Helm 4 is released)

Took a look at the hook logic and it waits for pods and jobs to be completed rather than just ready. I believe this can be done with kstatus, and I will verify later this week. I took a look at how Flux does it and it seems like we could implement the same logic with kstatus, will just involve custom code. Assuming my assumptions are correct, we will keep the WatchUntilReady function and move it into the Waiter interface.

Flux Job wait implementation - https://github.com/fluxcd/kustomize-controller/blob/main/internal/statusreaders/job.go

hips/hip-0999.md

Signed-off-by: Austin Abro <[email protected]>

gjenkins8 · 2025-01-28T02:51:38Z

hips/hip-0999.md

+
+`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share. 
+
+The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released. 


I think we can used the existing --wait flag e.g. --wait=true|false|none|kstatus|legacy (true & false are placeholders for kstatus and none respectively, to be compatible with existing users who might have --wait=true, if we decide this is warranted for Helm 4. else we just error those users)

Up for suggestions on whether we use the name "kstatus" in the flag. Or whether we consider this an "implementation detail". I suspect we will want to refer users to kstatus docs generally for behavioral reference, so it is okay to refer to kstatus directly

The only hiccup I see with structuring the --wait flag like this is that Helm hooks always wait regardless of if the wait flag is set. So a user wouldn't be able to not wait, and use the old Helm hook logic. Still, based on my other comment, we will end up doing a custom implementation anyway so we should be able to create relatively similar logic.

Other option would be introducing --wait-strategy and --hook-wait-strategy flags. IMO it's better to avoid the user complexity, but I'll leave the call to the maintainers.

What about --wait=false? That should handle not waiting, no?

The only hiccup I see with structuring the --wait flag like this is that Helm hooks always wait regardless of if the wait flag is set. So a user wouldn't be able to not wait, and use the old Helm hook logic. Still, based on my other comment, we will end up doing a custom implementation anyway so we should be able to create relatively similar logic.

Given the simplicity of hook wait logic, perhaps it is possible to directly determine whether kstatus will be suitable?

For the main wait logic, I think a fallback to legacy is needed as Helm's custom logic is fairly complex. And there is probably some scenario out there which kstatus and Helm's custom logic are not equivalent.

For the relatively hook wait logic, can we be more confident kstatus is a replacement?

Other option would be introducing --wait-strategy and --hook-wait-strategy flags. IMO it's better to avoid the user complexity, but I'll leave the call to the maintainers.

It would be nice to avoid adding more options to the Helm CLI (unless really needed) IMHO. Every CLI option adds to the complexity of the interface Helm presents to users.

For the relatively hook wait logic, can we be more confident kstatus is a replacement?

Now that we've added customstatusreaders, IMO we can be much more confident that kstatus is a drop in replacement. I've basically copied the logic over

Thinking about --wait vs --wait-strategy a bit more. While more CLI flags do add more complexity, since it's likely that the new watcher will work in the vast majority of cases, I think it would be simpler for most users to only have to use --wait, rather than them have to figure out which type of waiter they want to use.

Additionally since --wait is currently a bool flag, a compatibility shim for --wait=true likely wouldn't work for most cases as users will usually run helm install --wait without specifying true / false.

LMK what you think, about using --wait and --wait-strategy. When only --wait is specified the new watcher will be used.

The switch from plain bool to a string value isn't that hard, we've done something like that transparently to users in kubernetes/kubernetes#87580.

Fair enough, thanks for the example. I updated the doc to reflect a --wait flag that accepts true|false|none|watcher|legacy

gjenkins8 · 2025-01-28T02:52:18Z

hips/hip-0999.md

+
+The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released. 
+
+The current Helm wait logic does not wait for paused deployments to be ready. This is because if `helm upgrade --wait` is run on a chart with paused deployments they will never be ready, see [#5789](https://github.com/helm/helm/pull/5789).


nit: this seems more like motivation that specification

I'm not changing this in the PR, but I wanted to give the reference to the PR that explains why we wait for paused deployments. Let me make this line more clear

gjenkins8 · 2025-01-28T02:55:08Z

hips/hip-0999.md

+
+Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test.
+```yaml
+rules:


I don't think this is extensive enough. I suspect kstatus will need watch and list for every resource type in a chart (not just apps)

In particular, Helm only considered a few resources types previously.

This is fine, just want the HIP to be accurate

gjenkins8

This is great. A few comments/cleanups, but I think the HIP is close to being ready

Signed-off-by: Austin Abro <[email protected]>

AustinAbro321 · 2025-02-10T16:20:25Z

@gjenkins8 Finally was able to get back to this and address all the comments. My PR is ready to review as well - helm/helm#13604. I just need to add the cobra logic for --wait I'll do that tomorrow.

Signed-off-by: Austin Abro <[email protected]>

robertsirc

Thank you for your HIP. I am OK with what is being proposed here. You handle the legacy way this should work and a path going forward. I believe this HIP is fine for Helm v4.

joejulian

This makes sense to me.

AustinAbro321 added 4 commits December 6, 2024 20:17

start helm hip

6d420e8

Signed-off-by: Austin Abro <[email protected]>

updates

39a043c

Signed-off-by: Austin Abro <[email protected]>

grammar

c9dbf03

Signed-off-by: Austin Abro <[email protected]>

hip

472f81a

Signed-off-by: Austin Abro <[email protected]>

pull-request-size bot added the size/M label Dec 12, 2024

banjoh mentioned this pull request Dec 13, 2024

H4HIP: Helm Sequencing Proposal #373

Merged

gjenkins8 reviewed Dec 14, 2024

View reviewed changes

hips/hip-0999.md Outdated Show resolved Hide resolved

gjenkins8 reviewed Dec 14, 2024

View reviewed changes

hips/hip-0999.md Outdated Show resolved Hide resolved

gjenkins8 reviewed Dec 14, 2024

View reviewed changes

hips/hip-0999.md Outdated Show resolved Hide resolved

mattfarina reviewed Dec 14, 2024

View reviewed changes

hips/hip-0999.md Show resolved Hide resolved

updates

dcf6c8b

Signed-off-by: Austin Abro <[email protected]>

AustinAbro321 added 2 commits January 6, 2025 15:25

updates to new architecture

c049e80

Signed-off-by: Austin Abro <[email protected]>

updates to new architecture

0c7da3f

Signed-off-by: Austin Abro <[email protected]>

pull-request-size bot added size/L and removed size/M labels Jan 6, 2025

AustinAbro321 mentioned this pull request Jan 6, 2025

Introduce kstatus watcher helm/helm#13604

Merged

3 tasks

soltysh reviewed Jan 8, 2025

View reviewed changes

AustinAbro321 added 3 commits January 8, 2025 13:50

mention watch and polling

467dde6

Signed-off-by: Austin Abro <[email protected]>

mention poller vs watcher

bb2b190

Signed-off-by: Austin Abro <[email protected]>

update why around watch

5275514

Signed-off-by: Austin Abro <[email protected]>

gjenkins8 mentioned this pull request Jan 28, 2025

chore: Remove unused WaitAndGetCompletedPodPhase helm/helm#13665

Merged

3 tasks

gjenkins8 reviewed Jan 28, 2025

View reviewed changes

gjenkins8 mentioned this pull request Feb 2, 2025

HIP 20 - wait for custom resource conditions #382

Open

gjenkins8 mentioned this pull request Feb 9, 2025

feat: finer grained wait proposal #289

Open

AustinAbro321 added 5 commits February 10, 2025 15:44

remove poller

1787ccf

Signed-off-by: Austin Abro <[email protected]>

changes based on comments

77fcc9f

Signed-off-by: Austin Abro <[email protected]>

clearer line

cf05550

Signed-off-by: Austin Abro <[email protected]>

move line

6eb9d2a

Signed-off-by: Austin Abro <[email protected]>

update

9649edd

Signed-off-by: Austin Abro <[email protected]>

AustinAbro321 added 3 commits February 12, 2025 13:57

wait & wait strategy

9753918

Signed-off-by: Austin Abro <[email protected]>

update wait flag

50cd142

Signed-off-by: Austin Abro <[email protected]>

wait flag

a9a5d17

Signed-off-by: Austin Abro <[email protected]>

robertsirc approved these changes Feb 19, 2025

View reviewed changes

mattfarina approved these changes Feb 19, 2025

View reviewed changes

mattfarina merged commit dcee115 into helm:main Feb 19, 2025
1 check passed

joejulian reviewed Feb 20, 2025

View reviewed changes


		<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. -->

		## Backwards compatibility


		`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed.

		`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share.


		`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share.

		The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released.


		The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released.

		The current Helm wait logic does not wait for paused deployments to be ready. This is because if `helm upgrade --wait` is run on a chart with paused deployments they will never be ready, see [#5789](https://github.com/helm/helm/pull/5789).

H4HIP: Wait with kstatus #374

H4HIP: Wait with kstatus #374

Conversation

AustinAbro321 commented Dec 12, 2024

Uh oh!

gjenkins8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattfarina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AustinAbro321 commented Dec 14, 2024

Uh oh!

AustinAbro321 commented Jan 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjenkins8 Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AustinAbro321 Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjenkins8 left a comment

Choose a reason for hiding this comment

Uh oh!

AustinAbro321 commented Feb 10, 2025

Uh oh!

gjenkins8 Feb 3, 2025 •

edited

Loading

AustinAbro321 Feb 10, 2025 •

edited

Loading