hip | title | authors | created | type | status | |
---|---|---|---|---|---|---|
9999 |
Wait With kstatus |
|
2024-12-06 |
feature |
draft |
Currently the --wait
flag on helm install
and helm upgrade
does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with kstatus, Helm will achieve more intuitive waits, while simplifying its code and documentation.
By introducing kstatus we will lower user friction with the --wait
flag.
Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have the ready condition set to true.
Certain workflows requires resources to be fully reconciled, which does happen in the current --wait
logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Therefore, Helm will finish waiting even while old pods are still active. Since kstatus instead waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted.
Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of --wait
.
The Helm CLI will continue to use the existing --wait
flag. The wait flag will be extended to accept true|false|none|watcher|legacy
. When using --wait
as flag with no argument, --wait=true
, or --wait=watcher
Helm will use the kstatus watcher described in this document. When using --wait=legacy
Helm will use the current Helm 3 waiter. When using --wait=false
or not using the --wait
flag Helm not wait after deployments.
Kstatus can be used with either a poller or a watcher. The poller runs on a specified interval and only requires "list" RBAC permissions for polled resources. The watcher reacts to watch events and requires "list" and "watch" RBAC permissions. This proposal uses the watcher as it responds slightly faster when all resources are ready, and it is very likely that users applying or deleting resources will have "watch" permissions on their resources.
Any functions involving waits will be separated from the kube.Interface
interface into the kube.Waiter
interface. kube.Waiter
will be embedded into kube.Interface
. The client struct will embed the Waiter
interface to allow calls to look like client.Wait()
instead of client.Waiter.Wait()
. kube.New()
will accept a wait strategy to decide the wait implementation. There will be two implementation in the repo, HelmWaiter
and statusWaiter
. HelmWaiter
is the legacy implementation. The statusWaiter
will not be public so that if kstatus is ever deprecated or replaced a new implementation can be used without changing the public SDK.
The new client will look like:
type Client struct {
Factory Factory
Log func(string, ...interface{})
Namespace string
kubeClient *kubernetes.Clientset
Waiter
}
type WaitStrategy int
const (
StatusWaiter WaitStrategy = iota
LegacyWaiter
)
func New(getter genericclioptions.RESTClientGetter, ws WaitStrategy) (*Client, error)
The waiter interface will look like:
type Waiter interface {
Wait(resources ResourceList, timeout time.Duration) error
WaitWithJobs(resources ResourceList, timeout time.Duration) error
WaitForDelete(resources ResourceList, timeout time.Duration) error
WatchUntilReady(resources ResourceList, timeout time.Duration) error
}
Wait
will wait for all resources to be ready. This will include jobs, but this function not wait for jobs to be complete.
WaitWithJobs
will wait for all resources to be ready and all jobs to be complete.
WatchUntilReady
only waits for Pods and Jobs to complete. It is used for Helm hooks. This logic will stay the same.
Calls to Wait
and WaitWithJobs
will not wait for paused deployments. This is consistent with the current logic. This is done because otherwise helm upgrade --wait
on a paused deployment will never be ready, see #5789.
Kstatus does not natively output any logs. After each event, Helm will output a log message of one resources that's not ready. Only one resource is logged at a time so Helm doesn't overwhelm users with output. Note that the logs are only sent when called as a library, the CLI uses a noop logger for Kube operations.
Waiting for custom resources and other previously not waited for resources could lead to charts timing out when using the new logic.
The kstatus status watcher requires the "list" and "watch" RBAC permissions to watch a resource. The current Helm implementation only require "list" permissions for the resources they're watching. Kstatus and Helm require "list" permissions for some child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. There may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted.
Kstatus also watches more resources than Helm does. A user will need "list" and "watch" permissions to every resource that they are deploying. Currently, Helm only checks readiness of certain resources. See the IsReady function for details
Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test.
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list"]
Users will now need "watch" permissions on resources in their chart if the --wait
flag is used. They will also need "list" and "watch" permissions for all resources they are deploying rather than just the resources that Helm currently waits for.
Replace the existing wait documentation by explaining that we use kstatus and pointing to the kstatus documentation. This comes with the added benefit of not needing to maintain Helm specific wait documentation.
seen here - helm/helm#13604
TBD
existing wait documentation - https://helm.sh/docs/intro/using_helm/ kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go