-
Notifications
You must be signed in to change notification settings - Fork 192
H4HIP: Wait with kstatus #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6d420e8
39a043c
c9dbf03
472f81a
dcf6c8b
c049e80
0c7da3f
467dde6
bb2b190
5275514
1787ccf
77fcc9f
cf05550
6eb9d2a
9649edd
9753918
50cd142
a9a5d17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
hip: 9999 | ||
title: "Wait With kstatus" | ||
authors: [ "@austinabro321" ] | ||
created: "2024-12-06" | ||
type: "feature" | ||
status: "draft" | ||
--- | ||
|
||
## Abstract | ||
|
||
Currently the `--wait` flag on `helm install` and `helm upgrade` does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with [kstatus](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md), Helm will achieve more intuitive waits, while simplifying its code and documentation. | ||
|
||
## Motivation | ||
|
||
By introducing kstatus we will lower user friction with the `--wait` flag. | ||
|
||
Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have [the ready condition](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md#the-ready-condition) set to true. | ||
|
||
Certain workflows requires resources to be fully reconciled, which does happen in the current `--wait` logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Therefore, Helm will finish waiting even while old pods are still active. Since kstatus instead waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted. | ||
|
||
## Rationale | ||
|
||
Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of `--wait`. | ||
|
||
## Specification | ||
|
||
The Helm CLI will continue to use the existing `--wait` flag. The wait flag will be extended to accept `true|false|none|watcher|legacy`. When using `--wait` as flag with no argument, `--wait=true`, or `--wait=watcher` Helm will use the kstatus watcher described in this document. When using `--wait=legacy` Helm will use the current Helm 3 waiter. When using `--wait=false` or not using the `--wait` flag Helm not wait after deployments. | ||
|
||
Kstatus can be used with either a poller or a watcher. The poller runs on a specified interval and only requires "list" RBAC permissions for polled resources. The watcher reacts to [watch events](https://github.com/kubernetes/kubernetes/blob/90a45563ae1bab5868ee888432fec9aac2f7f8b1/staging/src/k8s.io/apimachinery/pkg/watch/watch.go#L55-L61) and requires "list" and "watch" RBAC permissions. This proposal uses the watcher as it responds slightly faster when all resources are ready, and it is very likely that users applying or deleting resources will have "watch" permissions on their resources. | ||
|
||
Any functions involving waits will be separated from the `kube.Interface` interface into the `kube.Waiter` interface. `kube.Waiter` will be embedded into `kube.Interface`. The client struct will embed the `Waiter` interface to allow calls to look like `client.Wait()` instead of `client.Waiter.Wait()`. `kube.New()` will accept a wait strategy to decide the wait implementation. There will be two implementation in the repo, `HelmWaiter` and `statusWaiter`. `HelmWaiter` is the legacy implementation. The `statusWaiter` will not be public so that if kstatus is ever deprecated or replaced a new implementation can be used without changing the public SDK. | ||
|
||
The new client will look like: | ||
```go | ||
type Client struct { | ||
Factory Factory | ||
Log func(string, ...interface{}) | ||
Namespace string | ||
kubeClient *kubernetes.Clientset | ||
Waiter | ||
} | ||
type WaitStrategy int | ||
const ( | ||
StatusWaiter WaitStrategy = iota | ||
LegacyWaiter | ||
) | ||
func New(getter genericclioptions.RESTClientGetter, ws WaitStrategy) (*Client, error) | ||
``` | ||
|
||
The waiter interface will look like: | ||
```go | ||
type Waiter interface { | ||
Wait(resources ResourceList, timeout time.Duration) error | ||
WaitWithJobs(resources ResourceList, timeout time.Duration) error | ||
WaitForDelete(resources ResourceList, timeout time.Duration) error | ||
WatchUntilReady(resources ResourceList, timeout time.Duration) error | ||
} | ||
``` | ||
|
||
`Wait` will wait for all resources to be ready. This will include jobs, but this function not wait for jobs to be complete. | ||
|
||
`WaitWithJobs` will wait for all resources to be ready and all jobs to be complete. | ||
|
||
`WatchUntilReady` only waits for Pods and Jobs to complete. It is used for Helm hooks. This logic will stay the same. | ||
|
||
Calls to `Wait` and `WaitWithJobs` will not wait for paused deployments. This is consistent with the current logic. This is done because otherwise `helm upgrade --wait` on a paused deployment will never be ready, see [#5789](https://github.com/helm/helm/pull/5789). | ||
|
||
Kstatus does not natively output any logs. After each event, Helm will output a log message of one resources that's not ready. Only one resource is logged at a time so Helm doesn't overwhelm users with output. Note that the logs are only sent when called as a library, the CLI uses a noop logger for Kube operations. | ||
|
||
## Backwards compatibility | ||
AustinAbro321 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any situation where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect, to be on the safe side, we will want/need to support a fallback flag to allow users to fallback to the "legacy" wait/ready code. Ideally we would deprecate that flag e.g. Helm 5 and remove the old code in a future version. Not sure what we would do if the legacy code serves some behavior that kstatus doesn't (ideally this doesn't happen of course), but I think we should allow users the fallback when upgrading to Helm 4 just in case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kstatus seems to be pretty extensible, we can implement a custom status reader and override the behavior for any resource type. I don't think there will have any long term issues |
||
|
||
Waiting for custom resources and other previously not waited for resources could lead to charts timing out when using the new logic. | ||
|
||
The kstatus status watcher requires the "list" and "watch" RBAC permissions to watch a resource. The current Helm implementation only require "list" permissions for the resources they're watching. Kstatus and Helm require "list" permissions for some child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. There may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted. | ||
|
||
Kstatus also watches more resources than Helm does. A user will need "list" and "watch" permissions to every resource that they are deploying. Currently, Helm only checks readiness of certain resources. See the [IsReady function](https://github.com/helm/helm/blob/0d66425d9a745d8a289b1a5ebb6ccc744436da95/pkg/kube/ready.go#L92) for details | ||
|
||
Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test. | ||
```yaml | ||
rules: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is extensive enough. I suspect kstatus will need watch and list for every resource type in a chart (not just apps) In particular, Helm only considered a few resources types previously. This is fine, just want the HIP to be accurate |
||
- apiGroups: ["apps"] | ||
resources: ["deployments"] | ||
verbs: ["list", "watch"] | ||
- apiGroups: ["apps"] | ||
resources: ["replicasets"] | ||
verbs: ["list"] | ||
``` | ||
|
||
|
||
## Security implications | ||
|
||
Users will now need "watch" permissions on resources in their chart if the `--wait` flag is used. They will also need "list" and "watch" permissions for all resources they are deploying rather than just the resources that Helm currently waits for. | ||
|
||
## How to teach this | ||
|
||
Replace the [existing wait documentation](https://helm.sh/docs/intro/using_helm/) by explaining that we use kstatus and pointing to the [kstatus documentation](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md). This comes with the added benefit of not needing to maintain Helm specific wait documentation. | ||
|
||
## Reference implementation | ||
|
||
seen here - https://github.com/helm/helm/pull/13604 | ||
|
||
## Rejected ideas | ||
|
||
TBD | ||
|
||
## Open issues | ||
|
||
[8661](https://github.com/helm/helm/issues/8661) | ||
|
||
## References | ||
|
||
existing wait documentation - https://helm.sh/docs/intro/using_helm/ | ||
kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md | ||
Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go |
Uh oh!
There was an error while loading. Please reload this page.