-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H4HIP: CRD updating #379
base: main
Are you sure you want to change the base?
H4HIP: CRD updating #379
Conversation
eb8ecef
to
2b5d9d7
Compare
Signed-off-by: George Jenkins <[email protected]>
2b5d9d7
to
343cb9f
Compare
Its pretty common these days for crds to be in templates so they can be upgraded too. We should probably keep this behavior and align it with the /crds directory too. Maybe:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not worked through the specification, yet. But, I wanted to add thoughts to the material up front.
1. CRDs are a cluster-wide resource. | ||
Changes to cluster wide resources can (more easily) break applications beyond the scope of the chart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just beyond the scope of the instance of a chart but beyond the scope of the user who is installing the chart. You can have two users of a cluster who do not have knowledge of each other or their work. This is where breaking can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me word users into here. I loosely mean that "cluster-wide" by de facto must presumed to be multi-user
For 1., it is thought that Helm should not treat CRDs specially here. | ||
Helm will readily operate on many other cluster wide resources today: Cluster roles, priority classes, namespaces, etc. | ||
That the modification/removal of could easily cause breakage outside of the chart's release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helm is a package manager rather than a general purpose management tool for Kubernetes resources. It's a subtle but important difference. Here are some ways to think about it...
- In the profiles we cover applications specifically. Cluster operations is specifically out of scope.
- The definition (from wikipedia) for a package manager:
A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.
Namespaces were not able to be created when 3.0.0 came out. The philosophy is that applications should be installed in a namespace but its not up to Helm to manage those and those should be created first, possibly part of configuration management. The namespace creation was added to provide backwards compatibility to Helm v2 (which had it) because we got so many issues filed about it.
We have not considered Helm the right tool to manage all resources since it's targeted at applications and Kubernetes resources go beyond that.
In general, Helm as a package manager should not try to prempt unintended functional changes from a chart. | ||
Validating functional changes is well beyond the scope of a package manager. | ||
Helm should treat CRDs the same as other (cluster-wide) resources, where a Helm chart upgrade that causes unintended functional effects should be reverted (rolled back) by the user (chart operator). | ||
And as long as that rollback path exists, this is a suitable path for users to mitigate breaking functional changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts here....
- Chart authors create charts. Often, Application operators are entirely different people and they install or upgrade charts. Application operators often do not have expertise in k8s or the applications they are running. When an application operator has a problem, especially a severe one like data loss, they file issues in the Helm issue queue. We have experience this in the past which is one of the reasons Helm has been conservative. Responding to those issues and managing them is time consuming.
- Those who can install/update CRDs are sometimes not the same people installing/upgrading the chart. In the past there has been a separation. We should better understand the current state of this. Being able to extract the CRDs and send them to someone with access is useful. Being able to install/upgrade a chart when you don't have global resource access is helpful. Or has been. We need to understand this landscape change.
I remember meeting with Helm users who had tight controls on their clusters. They had many who could use their namespaces and few who could manage things at the cluster level. This shaped Helm v3 designs. For example, it's the reason Helm uses secrets instead of a CRD. Using a CRD would limit who could use it.
For 2., Data loss should be treated more carefully. | ||
As data loss can be irrevocable or require significant effort to restore. | ||
And especially, an application/chart operator should not expect a chart upgrade to cause data loss. | ||
Helm can prevent Data loss can be prevented by ensuring CRDs are "appended" only (with special exceptions in the specification below). An in particular, appending allows a rollback to (effectively) restore existing cluster state. | ||
(It should also be noted that Helm will today remove other resources which may cause data loss e.g. secrets, config maps, namespace, etc. A special, hard-coded, exception does exist for persistent volumes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data loss is different between CRDs and something like secrets. If I remove a secret it removes the one secret. This impacts that single case. If a CRD is deleted all the CRs are deleted. For example, you have user A with application instance of a chart. And you have user B in that same cluster with a separate instance of the same chart. These 2 users do not know about each other. If user A deletes a CRD, even unintentionally, it will remove the CRs for user B and cause data loss for user B. This is a very different surface area from something like deleting a secret.
We also know that some don't have backups and will make changes in production. When things go wrong, Helm devs will get issues to triage and deal with.
No description provided.