Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H4HIP: Wait with kstatus #374

Merged
merged 18 commits into from
Feb 19, 2025
Prev Previous commit
Next Next commit
updates
Signed-off-by: Austin Abro <AustinAbro321@gmail.com>
AustinAbro321 committed Dec 14, 2024
commit dcf6c8b1fc0c4144f8c2116cd99d3ffd25f3eaca
27 changes: 23 additions & 4 deletions hips/hip-0999.md
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ Currently the `--wait` flag on `helm install` and `helm upgrade` does not wait f

## Motivation

Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone that has this requirement must write their own logic to wait for their custom resources.
Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have [the ready condition](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md#the-ready-condition) set to true.

Certain workflows requires resources to be fully reconciled. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed.
Certain workflows requires resources to be fully reconciled, which does happen in the current `--wait` logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Since kstatus waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted.

By introducing kstatus we will lower user friction with the `--wait` flag.

@@ -27,16 +27,35 @@ Leveraging a existing status management library maintained by the Kubernetes tea

From a CLI user's perspective there will be no changes in how waits are called, they will still use the `--wait` flag.

Kstatus does not output any logs. Helm will output a log message each second signalling that a resource is not ready. Helm will log one not ready resource at a time to not overwhelm users with output. This behavior will look similar to the current `--wait` logging. The resource will be chosen alphabetically by `metadata.name`.
Kstatus does not output any logs. Helm will output a log message each second signalling that a resource is not ready. Helm will log one of the resources it's waiting on at a time as to not overwhelm users with output. This behavior will look similar to the current `--wait` logging.

<!-- TODO: Decide if we want order to to which resources we log first, such as - alphabetical by name or The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. -->

A [dynamic rest mapper](https://github.com/kubernetes-sigs/controller-runtime/blob/aea2e32a936584b06ae6f7992f856fe7292b0297/pkg/client/apiutil/restmapper.go#L36) will be used, this way if a chart installs a custom resource definition, kstatus will be able to pick up the custom resources during wait.

<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. -->
Kstatus will not be exposed by the Helm SDK, instead an adapter/interface will be created. If kstatus is ever deprecated, unmaintained, or replaced there should be no impact to the public API of the Helm SDK.

<!-- TODO design API / interface -->

## Backwards compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where kstatus will not return ready, but existing logic would?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect, to be on the safe side, we will want/need to support a fallback flag to allow users to fallback to the "legacy" wait/ready code. Ideally we would deprecate that flag e.g. Helm 5 and remove the old code in a future version. Not sure what we would do if the legacy code serves some behavior that kstatus doesn't (ideally this doesn't happen of course), but I think we should allow users the fallback when upgrading to Helm 4 just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kstatus seems to be pretty extensible, we can implement a custom status reader and override the behavior for any resource type. I don't think there will have any long term issues


Waiting for custom resources and for reconciliation to complete for every resource could lead to charts timing out that weren't previously.

Kstatus does not require any general RBAC additions, such as events. This can be tested by following instructions in this repo https://github.com/AustinAbro321/kstatus-rbac-test.

Kstatus does require some RBAC permissions for child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. Currently, Helm's logic also necessitates RBAC permissions to "list" child replicasets, so this requirement is not new. However, there may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted.
<!-- TODO: I need to see if Helm requires watch -->
```yaml
rules:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is extensive enough. I suspect kstatus will need watch and list for every resource type in a chart (not just apps)

In particular, Helm only considered a few resources types previously.

This is fine, just want the HIP to be accurate

- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list"]
```


## Security implications

No security implications