helm · mattfarina · Feb 19, 2025 · Dec 6, 2024 · Dec 12, 2024 · Dec 12, 2024
diff --git a/hips/hip-0999.md b/hips/hip-0999.md
@@ -0,0 +1,115 @@
+---
+hip: 9999
+title: "Wait With kstatus"
+authors: [ "@austinabro321" ]
+created: "2024-12-06"
+type: "feature"
+status: "draft"
+---
+
+## Abstract
+
+Currently the `--wait` flag on `helm install` and `helm upgrade` does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with [kstatus](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md), Helm will achieve more intuitive waits, while simplifying its code and documentation.
+
+## Motivation
+
+By introducing kstatus we will lower user friction with the `--wait` flag. 
+
+Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have [the ready condition](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md#the-ready-condition) set to true. 
+
+Certain workflows requires resources to be fully reconciled, which does happen in the current `--wait` logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Therefore, Helm will finish waiting even while old pods are still active. Since kstatus instead waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted. 
+
+## Rationale
+
+Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of `--wait`. 
+
+## Specification
+
+The Helm CLI will continue to use the existing `--wait` flag. The wait flag will be extended to accept `true|false|none|watcher|legacy`. When using `--wait` as flag with no argument, `--wait=true`, or `--wait=watcher` Helm will use the kstatus watcher described in this document. When using `--wait=legacy` Helm will use the current Helm 3 waiter. When using `--wait=false` or not using the `--wait` flag Helm not wait after deployments.
+
+Kstatus can be used with either a poller or a watcher. The poller runs on a specified interval and only requires "list" RBAC permissions for polled resources. The watcher reacts to [watch events](https://github.com/kubernetes/kubernetes/blob/90a45563ae1bab5868ee888432fec9aac2f7f8b1/staging/src/k8s.io/apimachinery/pkg/watch/watch.go#L55-L61) and requires "list" and "watch" RBAC permissions. This proposal uses the watcher as it responds slightly faster when all resources are ready, and it is very likely that users applying or deleting resources will have "watch" permissions on their resources.
+
+Any functions involving waits will be separated from the `kube.Interface` interface into the `kube.Waiter` interface. `kube.Waiter` will be embedded into `kube.Interface`. The client struct will embed the `Waiter` interface to allow calls to look like `client.Wait()` instead of `client.Waiter.Wait()`. `kube.New()` will accept a wait strategy to decide the wait implementation. There will be two implementation in the repo, `HelmWaiter` and `statusWaiter`. `HelmWaiter` is the legacy implementation. The `statusWaiter` will not be public so that if kstatus is ever deprecated or replaced a new implementation can be used without changing the public SDK. 
+
+The new client will look like: 
+```go
+type Client struct {
+	Factory Factory
+	Log     func(string, ...interface{})
+	Namespace string
+	kubeClient *kubernetes.Clientset
+	Waiter
+}
+type WaitStrategy int
+const (
+	StatusWaiter WaitStrategy = iota
+	LegacyWaiter
+)
+func New(getter genericclioptions.RESTClientGetter, ws WaitStrategy) (*Client, error)
+```
+
+The waiter interface will look like: 
+```go
+type Waiter interface {
+	Wait(resources ResourceList, timeout time.Duration) error
+	WaitWithJobs(resources ResourceList, timeout time.Duration) error
+	WaitForDelete(resources ResourceList, timeout time.Duration) error
+  WatchUntilReady(resources ResourceList, timeout time.Duration) error
+}
+```
+
+`Wait` will wait for all resources to be ready. This will include jobs, but this function not wait for jobs to be complete.
+
+`WaitWithJobs` will wait for all resources to be ready and all jobs to be complete.
+
+`WatchUntilReady` only waits for Pods and Jobs to complete. It is used for Helm hooks. This logic will stay the same.
+
+Calls to `Wait` and `WaitWithJobs` will not wait for paused deployments. This is consistent with the current logic. This is done because otherwise `helm upgrade --wait` on a paused deployment will never be ready, see [#5789](https://github.com/helm/helm/pull/5789).
+
+Kstatus does not natively output any logs. After each event, Helm will output a log message of one resources that's not ready. Only one resource is logged at a time so Helm doesn't overwhelm users with output. Note that the logs are only sent when called as a library, the CLI uses a noop logger for Kube operations.
+
+## Backwards compatibility
+
+Waiting for custom resources and other previously not waited for resources could lead to charts timing out when using the new logic.
+
+The kstatus status watcher requires the "list" and "watch" RBAC permissions to watch a resource. The current Helm implementation only require "list" permissions for the resources they're watching. Kstatus and Helm require "list" permissions for some child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. There may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted.
+
+Kstatus also watches more resources than Helm does. A user will need "list" and "watch" permissions to every resource that they are deploying. Currently, Helm only checks readiness of certain resources. See the [IsReady function](https://github.com/helm/helm/blob/0d66425d9a745d8a289b1a5ebb6ccc744436da95/pkg/kube/ready.go#L92) for details
+
+Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test.
+```yaml
+rules:
+  - apiGroups: ["apps"]
+    resources: ["deployments"]
+    verbs: ["list", "watch"] 
+  - apiGroups: ["apps"]
+    resources: ["replicasets"]
+    verbs: ["list"]
+```
+
+
+## Security implications
+
+Users will now need "watch" permissions on resources in their chart if the `--wait` flag is used. They will also need "list" and "watch" permissions for all resources they are deploying rather than just the resources that Helm currently waits for. 
+
+## How to teach this
+
+Replace the [existing wait documentation](https://helm.sh/docs/intro/using_helm/) by explaining that we use kstatus and pointing to the [kstatus documentation](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md). This comes with the added benefit of not needing to maintain Helm specific wait documentation.
+
+## Reference implementation
+
+seen here - https://github.com/helm/helm/pull/13604
+
+## Rejected ideas
+
+TBD
+
+## Open issues
+
+[8661](https://github.com/helm/helm/issues/8661)
+
+## References
+
+existing wait documentation - https://helm.sh/docs/intro/using_helm/
+kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md
+Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go