Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H4HIP: CRD updating #379

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

gjenkins8
Copy link
Member

No description provided.

@gjenkins8 gjenkins8 force-pushed the hip4_crd_appending branch 2 times, most recently from eb8ecef to 2b5d9d7 Compare December 21, 2024 05:00
Signed-off-by: George Jenkins <[email protected]>
@kfox1111
Copy link

Its pretty common these days for crds to be in templates so they can be upgraded too. We should probably keep this behavior and align it with the /crds directory too.

Maybe:

  • Render templates part 1. Render out any crds found, ignoring other templates. Treat the rendered crds as if in /crds, apply those. Render templates part 2. Render everything but the crds as normal

Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not worked through the specification, yet. But, I wanted to add thoughts to the material up front.

Comment on lines +52 to +53
1. CRDs are a cluster-wide resource.
Changes to cluster wide resources can (more easily) break applications beyond the scope of the chart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just beyond the scope of the instance of a chart but beyond the scope of the user who is installing the chart. You can have two users of a cluster who do not have knowledge of each other or their work. This is where breaking can happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me word users into here. I loosely mean that "cluster-wide" by de facto must presumed to be multi-user

Comment on lines +57 to +59
For 1., it is thought that Helm should not treat CRDs specially here.
Helm will readily operate on many other cluster wide resources today: Cluster roles, priority classes, namespaces, etc.
That the modification/removal of could easily cause breakage outside of the chart's release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm is a package manager rather than a general purpose management tool for Kubernetes resources. It's a subtle but important difference. Here are some ways to think about it...

  • In the profiles we cover applications specifically. Cluster operations is specifically out of scope.
  • The definition (from wikipedia) for a package manager:

    A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.

Namespaces were not able to be created when 3.0.0 came out. The philosophy is that applications should be installed in a namespace but its not up to Helm to manage those and those should be created first, possibly part of configuration management. The namespace creation was added to provide backwards compatibility to Helm v2 (which had it) because we got so many issues filed about it.

We have not considered Helm the right tool to manage all resources since it's targeted at applications and Kubernetes resources go beyond that.

Comment on lines +61 to +64
In general, Helm as a package manager should not try to prempt unintended functional changes from a chart.
Validating functional changes is well beyond the scope of a package manager.
Helm should treat CRDs the same as other (cluster-wide) resources, where a Helm chart upgrade that causes unintended functional effects should be reverted (rolled back) by the user (chart operator).
And as long as that rollback path exists, this is a suitable path for users to mitigate breaking functional changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts here....

  1. Chart authors create charts. Often, Application operators are entirely different people and they install or upgrade charts. Application operators often do not have expertise in k8s or the applications they are running. When an application operator has a problem, especially a severe one like data loss, they file issues in the Helm issue queue. We have experience this in the past which is one of the reasons Helm has been conservative. Responding to those issues and managing them is time consuming.
  2. Those who can install/update CRDs are sometimes not the same people installing/upgrading the chart. In the past there has been a separation. We should better understand the current state of this. Being able to extract the CRDs and send them to someone with access is useful. Being able to install/upgrade a chart when you don't have global resource access is helpful. Or has been. We need to understand this landscape change.

I remember meeting with Helm users who had tight controls on their clusters. They had many who could use their namespaces and few who could manage things at the cluster level. This shaped Helm v3 designs. For example, it's the reason Helm uses secrets instead of a CRD. Using a CRD would limit who could use it.

Comment on lines +66 to +70
For 2., Data loss should be treated more carefully.
As data loss can be irrevocable or require significant effort to restore.
And especially, an application/chart operator should not expect a chart upgrade to cause data loss.
Helm can prevent Data loss can be prevented by ensuring CRDs are "appended" only (with special exceptions in the specification below). An in particular, appending allows a rollback to (effectively) restore existing cluster state.
(It should also be noted that Helm will today remove other resources which may cause data loss e.g. secrets, config maps, namespace, etc. A special, hard-coded, exception does exist for persistent volumes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data loss is different between CRDs and something like secrets. If I remove a secret it removes the one secret. This impacts that single case. If a CRD is deleted all the CRs are deleted. For example, you have user A with application instance of a chart. And you have user B in that same cluster with a separate instance of the same chart. These 2 users do not know about each other. If user A deletes a CRD, even unintentionally, it will remove the CRs for user B and cause data loss for user B. This is a very different surface area from something like deleting a secret.

We also know that some don't have backups and will make changes in production. When things go wrong, Helm devs will get issues to triage and deal with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants