Skip to content

Latest commit

 

History

History
115 lines (78 loc) · 7.41 KB

hip-0999.md

File metadata and controls

115 lines (78 loc) · 7.41 KB
hip title authors created type status
9999
Wait With kstatus
@austinabro321
2024-12-06
feature
draft

Abstract

Currently the --wait flag on helm install and helm upgrade does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with kstatus, Helm will achieve more intuitive waits, while simplifying its code and documentation.

Motivation

By introducing kstatus we will lower user friction with the --wait flag.

Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have the ready condition set to true.

Certain workflows requires resources to be fully reconciled, which does happen in the current --wait logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Therefore, Helm will finish waiting even while old pods are still active. Since kstatus instead waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted.

Rationale

Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of --wait.

Specification

The Helm CLI will continue to use the existing --wait flag. The wait flag will be extended to accept true|false|none|watcher|legacy. When using --wait as flag with no argument, --wait=true, or --wait=watcher Helm will use the kstatus watcher described in this document. When using --wait=legacy Helm will use the current Helm 3 waiter. When using --wait=false or not using the --wait flag Helm not wait after deployments.

Kstatus can be used with either a poller or a watcher. The poller runs on a specified interval and only requires "list" RBAC permissions for polled resources. The watcher reacts to watch events and requires "list" and "watch" RBAC permissions. This proposal uses the watcher as it responds slightly faster when all resources are ready, and it is very likely that users applying or deleting resources will have "watch" permissions on their resources.

Any functions involving waits will be separated from the kube.Interface interface into the kube.Waiter interface. kube.Waiter will be embedded into kube.Interface. The client struct will embed the Waiter interface to allow calls to look like client.Wait() instead of client.Waiter.Wait(). kube.New() will accept a wait strategy to decide the wait implementation. There will be two implementation in the repo, HelmWaiter and statusWaiter. HelmWaiter is the legacy implementation. The statusWaiter will not be public so that if kstatus is ever deprecated or replaced a new implementation can be used without changing the public SDK.

The new client will look like:

type Client struct {
	Factory Factory
	Log     func(string, ...interface{})
	Namespace string
	kubeClient *kubernetes.Clientset
	Waiter
}
type WaitStrategy int
const (
	StatusWaiter WaitStrategy = iota
	LegacyWaiter
)
func New(getter genericclioptions.RESTClientGetter, ws WaitStrategy) (*Client, error)

The waiter interface will look like:

type Waiter interface {
	Wait(resources ResourceList, timeout time.Duration) error
	WaitWithJobs(resources ResourceList, timeout time.Duration) error
	WaitForDelete(resources ResourceList, timeout time.Duration) error
  WatchUntilReady(resources ResourceList, timeout time.Duration) error
}

Wait will wait for all resources to be ready. This will include jobs, but this function not wait for jobs to be complete.

WaitWithJobs will wait for all resources to be ready and all jobs to be complete.

WatchUntilReady only waits for Pods and Jobs to complete. It is used for Helm hooks. This logic will stay the same.

Calls to Wait and WaitWithJobs will not wait for paused deployments. This is consistent with the current logic. This is done because otherwise helm upgrade --wait on a paused deployment will never be ready, see #5789.

Kstatus does not natively output any logs. After each event, Helm will output a log message of one resources that's not ready. Only one resource is logged at a time so Helm doesn't overwhelm users with output. Note that the logs are only sent when called as a library, the CLI uses a noop logger for Kube operations.

Backwards compatibility

Waiting for custom resources and other previously not waited for resources could lead to charts timing out when using the new logic.

The kstatus status watcher requires the "list" and "watch" RBAC permissions to watch a resource. The current Helm implementation only require "list" permissions for the resources they're watching. Kstatus and Helm require "list" permissions for some child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. There may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted.

Kstatus also watches more resources than Helm does. A user will need "list" and "watch" permissions to every resource that they are deploying. Currently, Helm only checks readiness of certain resources. See the IsReady function for details

Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test.

rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["list", "watch"] 
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["list"]

Security implications

Users will now need "watch" permissions on resources in their chart if the --wait flag is used. They will also need "list" and "watch" permissions for all resources they are deploying rather than just the resources that Helm currently waits for.

How to teach this

Replace the existing wait documentation by explaining that we use kstatus and pointing to the kstatus documentation. This comes with the added benefit of not needing to maintain Helm specific wait documentation.

Reference implementation

seen here - helm/helm#13604

Rejected ideas

TBD

Open issues

8661

References

existing wait documentation - https://helm.sh/docs/intro/using_helm/ kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go