Skip to content

KEP 831-Kubeflow-Helm-Support: Support Helm as an Alternative for Kustomize #832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented Mar 6, 2025

Helm KEP from @varodrig @chasecadet @juliusvonkohout
Fixed branch from #830

Placeholder: #831
Implementation: kubeflow/manifests#2730

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from juliusvonkohout. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@varodrig
Copy link
Contributor

varodrig commented Mar 6, 2025

/lgtm

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you re-created the KEP.

/hold for community to review.
cc @kubeflow/wg-training-leads @kubeflow/wg-automl-leads @kubeflow/wg-manifests-leads @kubeflow/wg-data-leads @kubeflow/release-team @kubeflow/wg-pipeline-leads @kubeflow/wg-notebooks-leads @kubeflow/red-hat @kubeflow/wg-deployment-leads @kubeflow/kubeflow-steering-committee

@thesuperzapper
Copy link
Member

Foundationally, this is still a discussion around if we have an "official" distribution or not.

This KEP proposes a single "mega" helm chart with all components inside, this is by definition, an opinionated "distribution" of Kubeflow Platform.

The community has discussed and rejected having official distributions in the past, for reasons that are still applicable:

  1. The fundamental goal of the project is to make AI/ML tools for Kubernetes (Pipelines, Notebooks, Trainer, Feature Store, etc.)
  2. Our strength is that our tools can run on any Kubernetes cluster, with no preference for any deployment method, cloud vendor, or anything else.
  3. If we make opinionated deployment decisions, less people will be able to adopt our tools, or include them in downstream platforms.
  4. There is strong evidence that it's not possible to create a successful "generic" distribution. See the fact that multiple successful distributions exist today, each with different opinionated approaches.
  5. Those in the OWNERS file of an official distribution would have the ability to make decisions that affect the whole community. This likely means that 1-3 consulting/cloud/platform companies make decisions that benefit them, rather than the goal of making our tools the standard.

PS: I want to stress that your motivations about making Kubeflow easier to use are great, and I am sure some users would love a Kubeflow Distribution that looks like this. (In fact, there are at least 2 that I am aware of which are similar to your proposal already, so perhaps you can collaborate with them). However, it's critical to keep the project neutral and focused on the tools themselves.


Also, while it's clearly not the intention of this KEP, there is a separate discussion around if automatically-generated helm charts based on the existing component kustomize manifests would be useful for downstream distributions. But that would be a completely separate proposal.

@lburgazzoli
Copy link

there is a separate discussion around if automatically-generated helm charts based on the existing component kustomize manifests would be useful for downstream distributions.

@thesuperzapper is this discussion already happening somewhere ?

@shaikmoeed
Copy link

Thanks for this great feature—it's really going useful for us! As a suggestion, it would be awesome to include some migration documentation on transitioning from Kustomize to Helm in Goals. Also, any guidance on achieving a zero-downtime migration would be much appreciated. It will help users who are willing to utilise this.

Thanks again for all the great work!

@chasecadet
Copy link
Contributor

Thanks for this great feature—it's really going useful for us! As a suggestion, it would be awesome to include some migration documentation on transitioning from Kustomize to Helm in Goals. Also, any guidance on achieving a zero-downtime migration would be much appreciated. It will help users who are willing to utilise this.

Thanks again for all the great work!

Hey @shaikmoeed thanks for the feedback! One thing that would really help with this initiative is if you would be willing to tell us how Helm would help you and what you are looking for from these Helm charts. Your insights would be super awesome!

@shaikmoeed
Copy link

Hey @shaikmoeed thanks for the feedback! One thing that would really help with this initiative is if you would be willing to tell us how Helm would help you and what you are looking for from these Helm charts. Your insights would be super awesome!

@chasecadet Thanks for asking! We maintain a customized local version of kubeflow/manifests (with patches like Istio, OAuth2 and other fixes), which makes our upgrades challenging. As we manage most of our k8s service using Helm with ArgoCD, this initiative would simplify our process by letting us enable only the needed components and manage our patches more cleanly—eliminating the need to maintain local copies and keeping our upgrade PRs much smaller and easier to review.

@chasecadet
Copy link
Contributor

Hey @shaikmoeed thanks for the feedback! One thing that would really help with this initiative is if you would be willing to tell us how Helm would help you and what you are looking for from these Helm charts. Your insights would be super awesome!

@chasecadet Thanks for asking! We maintain a customized local version of kubeflow/manifests (with patches like Istio, OAuth2 and other fixes), which makes our upgrades challenging. As we manage most of our k8s service using Helm with ArgoCD, this initiative would simplify our process by letting us enable only the needed components and manage our patches more cleanly—eliminating the need to maintain local copies and keeping our upgrade PRs much smaller and easier to review.

Good to know! Also, we'd love to know about how you use Kubeflow and your use cases, but probably for a different medium. Feel free to hit me up on the CNCF slack (chasecadet) if you'd like to share. Also let's ensure your org is on the adopters list so you get credit for being bleeding edge!

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Mar 8, 2025

Foundationally, this is still a discussion around if we have an "official" distribution or not.

This KEP proposes a single "mega" helm chart with all components inside, this is by definition, an opinionated "distribution" of Kubeflow Platform.

Actually that is not the case. It is not a distribution, since they would still be the community manifests, not derived/deviated from it.

But you are right in the sense that we should have helm charts for the individual components and combine them as we combine the kustomize manifests of the individual components. In the end we have a wonderful heterogeneous userbase and they want both options . If possible we should just combine the smaller ones in a meta helm chart or so similar to the kustomize overlay https://github.com/kubeflow/manifests/blob/master/example/kustomization.yaml . In the end the goal is to have something helm based with a similar structure / goal as the current kustomize manifests. Please make constructive suggestions in the KEP how we can emphasize this more.

"Also, while it's clearly not the intention of this KEP, there is a separate discussion around if automatically-generated helm charts based on the existing component kustomize manifests would be useful for downstream distributions. But that would be a completely separate proposal.

Also here I have to object, this is not a separate discussion. Automatically-generated helm charts are within this KEP. But that is an implementation detail of the single source of truth requirement. CC @lburgazzoli

@juliusvonkohout
Copy link
Member Author

Hey @shaikmoeed thanks for the feedback! One thing that would really help with this initiative is if you would be willing to tell us how Helm would help you and what you are looking for from these Helm charts. Your insights would be super awesome!

@chasecadet Thanks for asking! We maintain a customized local version of kubeflow/manifests (with patches like Istio, OAuth2 and other fixes), which makes our upgrades challenging. As we manage most of our k8s service using Helm with ArgoCD, this initiative would simplify our process by letting us enable only the needed components and manage our patches more cleanly—eliminating the need to maintain local copies and keeping our upgrade PRs much smaller and easier to review.

Hello, are you familiar with https://github.com/kubeflow/manifests?tab=readme-ov-file#upgrading-and-extending ? Maybe this can help you until the helm manifests are ready.

@juliusvonkohout
Copy link
Member Author

@shaikmoeed https://github.com/kubeflow/community/blob/master/ADOPTERS.md is the file where you can add your company.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort team!
I left my initial comments.
I will review it again later this week.

@@ -0,0 +1,454 @@
# 649-Kubeflow-Helm-Support: Support Helm as an Alternative for Kustomize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should find better name for this KEP, something like: Helm Charts for Kubeflow Projects, since we are not planning to stop supporting kustomize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not just for the individual projects, so maybe just "installing Kubeflow with Helm" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not just for the individual projects

We haven't find final agreement on this yet, so I suggest to keep the name of this KEP: Support Helm Charts for Kubeflow Ecosystem.

As a project, we must ensure that our Helm chart provides a quick and accessible way for users to deploy a complete Kubeflow platform and individual components, enabling them to manage their environments or adopt a vendor solution.


Simplifying Kubeflow deployment lowers the barrier to entry, increases adoption, and encourages contributions. Just as Kubernetes enabled a new wave of cloud-native startups, a neutral, accessible deployment path can empower AI/ML startups to leverage tools like the Training Operator or Katib without reinventing common patterns. If support becomes burdensome, teams can hire expertise or use a distribution—both of which drive demand for Kubeflow skills.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I am not sure that Helm Charts will simplify Kubeflow installation, it just provides another method in addition to Kustomize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify in the sense that it satisfies company policies and works with the existing setup of users and that they do not need to learn kustomize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to "Simplifying the Kubeflow deployment and customization ..."

As in ehe paragraph above it is also sometimes plain enabling "many potential users and companies that require Helm charts due to company processes/policies"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that they do not need to learn kustomize.

But they still need to learn Helm, isn't ?

I can see value of Helm Charts for Kubeflow Projects to align with companies processes and policies, but not with simplification, since Kustomize is simple enough and natively integrates with kubectl create -k



```
kubeflow/manifests/experimental/helm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this structure is what we should do, looking at other project like Argo, Grafana, etc.
A few questions:

  1. How are you planning to maintain centralized values for all charts ?
  2. How are you planning to sync underlying Helm Charts with this centralized charts.
  3. How are we going to define Ownership for those charts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich do you have any opinions? I see questions and would love some solutions too! What are your thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end we want the same for the helm manifests as for the kustomize manifests. Synchronize them form the individual projects. Just at the beginning or as long as they are experimental it makes sense to start in one place with the ci/cd etc. available. You can easily check whether the difference between helm and kustomize is zero in additional ci/cd tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in the text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you propose this KEP, I would like to hear your perspective first @chasecadet.
I can comment my thoughts after it.

Just at the beginning or as long as they are experimental it makes sense to start in one place with the ci/cd etc. available

What do you mean by experimental ? WG Manifests goals and charter doesn't say anything about this process: https://github.com/kubeflow/community/blob/master/wg-manifests/charter.md.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the charter now. contrib (renamed to experimental) was even in the 5 year old charter.

Copy link
Member Author

@juliusvonkohout juliusvonkohout Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WG Manifests/Platform Charter

This charter adheres to the conventions, roles and organization management
outlined in [wg-governance].

This charter describes the working mode / reality / status quo of the last 5 years as of March 2025.

Scope

  • Enable users to install, extend and maintain Kubeflow as a platform for multiple users
  • This includes dependencies, security efforts and examplary integration with popular tools and frameworks.
  • Synchronize the manifests (Helm, Kustomize) between working groups
  • We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...)
  • We do not support a specific deployment tool (e.g., ArgoCD, Flux)
  • The default installation shall not contain deep integration with external cloud services or closed source solutions
  • We provide hints and experimental examples how a user could integrate non-default external authentication (e.g. companies Identity Provider) and popular services on his own
  • There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installatio: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ...
  • There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ...

Communication Tasks

With Application Owners

  • Aid the application owner in creating manifests (Helm, Kustomize) for his application
  • Aid the application owner regarding security best practices
  • Communicate with the application owner regarding releases and versioning

With Distribution Owners

  • Distributions are strongly opinionated derivatives of Kubeflow platform / manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ...
  • A distribution can be created by an arbitrary amount of users / companies in private or in public, see the definition above
  • Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the scope of Manifests WG should be discussed separately and we should get consensus from all other WGs:
@kubeflow/kubeflow-steering-committee @kubeflow/wg-training-leads @kubeflow/wg-automl-leads @kubeflow/wg-pipeline-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-manifests-leads

Check this: #356

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lphiri would love your thoughts here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, I see that @juliusvonkohout raised a proposal to update the Manifests WG charter:


Having a discussion around the scope of the Manifest WG (and coming to a community consensus) is probably critical before committing to expansion of the Manifests WG scope.

Especially given how controversial the initial formation of the Manifests WG was (see #434)

## Design Details


### Helm Chart Structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich help me understand what specifically you are calling out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Provide a catalog (centralized repository) of Kubeflow application manifests." Helm manifests fit that as well as kustomize manifests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes we should update the charter in general and make it leaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless that might be for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, here is the scope of Manifests WG @chasecadet.

Maintain tooling to automate copying manifests from upstream app repos.

The manifests should just copy the code assets from the upstream app repos.
This is not what is proposed here.

Copy link
Member Author

@juliusvonkohout juliusvonkohout Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that is far far away from the last 5 years of reality. "Maintain tooling to automate copying manifests from upstream app repos." Is actually just one very small part (maybe 2 % of the total effort) and even the least important part and gives users almost nothing in value. It would render the repository, kubeflow releases and this KEP useless and obsolete. With only that Kubeflow would fail completely as a platform and we could archive the repository and kubeflow releases in general.

The value for users in kubeflow/manifests is created by combining and integrating with multi-tenancy, servicemesh etc and the other shared platform components providing a full platform installation. They get components that are glued together, work together and are tested together. That is what users use the repository for. Otherwise they can just go to the individual project repositories and have a bunch of independent single-tenant applications without unified authentication and authorization. That might be useful enough for some, but you would loose 50+% of the current user base. Please also see the updated scenarios in the KEP for clarification.

So yes I can try to update the charter by 5+ years and get it to the reality of 2025 and what the actual usage is in this PR here.

Copy link
Member

@andreyvelich andreyvelich Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With only that Kubeflow would fail completely as a platform and we could archive the repository and kubeflow releases in general.

Can you elaborate on this more please ? From my point of view, the value of Kubeflow ecosystem is not in multi-tenancy or auth, but rather in AI/ML capabilities.
Yes, we can provide our "opinionated" solution for auth with Dex and Istio, but this is not mandatory for folks to integrate with Kubeflow projects.

They get components that are glued together, work together and are tested together. That is what users use the repository for.

Can we actually gather a feedback from real users who are using kubeflow/manifests today and complaining about Kustomize ?
I would like to understand how the proposed Helm Chart in this KEP would enhance their experience within Kubeflow Ecosystem?

As we've discussed multiple times with @johnugeorge and other members of @kubeflow/kubeflow-steering-committee, kubeflow/manifests is a good starting point to explore Kubeflow's capabilities.
I believe, Kustomize works well for this purpose.

Copy link
Member Author

@juliusvonkohout juliusvonkohout Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Can you elaborate on this more please ? From my point of view, the value of Kubeflow ecosystem is not in multi-tenancy or auth, but rather in AI/ML capabilities."
That might not be relevant for you. For me for example Katib and trainer is not relevant so far, but Kubeflow Platform and KFP is. That does not mean that it is not relevant for other users. We are just two different users.

Most companies I know want to have one isolated namespace per user, where he can login and run his jupyterlabs, pipelines, inferenceservices, model registries etc. Then comes user 2-999 who also needs to have his jupyterlab etc. and must have hard isolation from the other users or the CISO will just say no to Kubeflow.
So authentication and authorization, networkpolicies, podsecuritystandards is mandatory to even get this basic setup with one isolated namespace per user and access to KFP, Jupyterlabs etc. running. I talked to hospitals, banks, insurances, and many other companies who use Kubeflow that way, that is also the reason they install from kubeflow/manifests in the first place. Otherwise you have to set up one Kubeflow per user, so hundreds of single-tenant Kubeflows with insane resource overhead (Teraybytes of memory).

"Yes, we can provide our "opinionated" solution for auth with Dex and Istio, but this is not mandatory for folks to integrate with Kubeflow projects." Oauth2-proxy and Dex is not really opinionated. They are just the two endpoints where companies connect to the companies internal IDP for the single-sign-on. They do not build the authentication in Kubeflow themselves (way to expensive). They just hook in via SingleSignOn. This is where it becomes opinionated based on the companies preferences.

"Can we actually gather a feedback from real users who are using kubeflow/manifests today and complaining about Kustomize" it is one of the most requested features and you can take a look at the kubeflow-helm channel. That is what you typically hear at conferences and from companies (requiring it to even start with Kubeflow) as well. That is also why there are 10 different community efforts for helm charts. CC @chasecadet

Please have a look at the current Kubeflow architecture. This is from 2023, but still has the most important parts. Now imagine 1000 user namespaces and that within these components there is authentication and authorization to check which user can access which namespace and services.

image

This KEP is about Helm manifests, not changing the Platform architecture.

Signed-off-by: juliusvonkohout <[email protected]>
@google-oss-prow google-oss-prow bot removed the lgtm label Mar 11, 2025
Copy link

New changes are detected. LGTM label has been removed.

Signed-off-by: juliusvonkohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Mar 17, 2025
Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
@chasecadet
Copy link
Contributor

chasecadet commented Mar 17, 2025

@andreyvelich, let's try a well-known technique: Observation, Feeling, Need, Request. This approach can help us unpack details rather than relying on "certainty in correctness," which is something I constantly challenge myself on while staying curious. I appreciate all the great insights you're sharing here.
Observation

I noticed you said:

"The manifests should just copy the code assets from the upstream app repos. This is not what is proposed here."
"kubeflow/manifests is a good starting point to explore Kubeflow's capabilities. I believe Kustomize works well for this purpose."

Feeling

I'm feeling a bit confused because I’ve seen community members express concerns that Kustomize is too complex and instead request a Helm chart. I suspect this stems from a misalignment in who Kustomize is best suited for. My perspective is that Kustomize works well for experts like you—SMEs in manifests who deploy custom components at a high level. However, many users seem to prefer Helm because it abstracts away complexity with its templating engine.
re @juliusvonkohout

"Can we actually gather a feedback from real users who are using kubeflow/manifests today and complaining about Kustomize" it is one of the most requested features and you can take a look at the kubeflow-helm channel. That is what you typically hear at conferences and from companies (requiring it to even start with Kubeflow) as well. That is also why there are 10 different community efforts for helm charts. CC @chasecadet.

Need

I want to better understand what makes you uncomfortable or unsatisfied with this request.

What scenarios are you considering?
How do you view the trade-offs between Kustomize and Helm?
How do you think this KEP might impact your work or the project’s direction?

Request

Would you be open to unpacking your full perspective so I can understand it more clearly? Overcommunication here would really help me.

One additional thought: Do we need to explore a new approach?
re:

"Yes, we can provide our "opinionated" solution for auth with Dex and Istio, but this is not mandatory for folks to integrate with Kubeflow projects." Oauth2-proxy and Dex is not really opinionated. They are just the two endpoints where companies connect to the companies internal IDP for the single-sign-on. They do not build the authentication in Kubeflow themselves

Are we dealing with two separate deployment needs—one for standalone, callable components (e.g., the Training Operator) and another for a full platform experience where users have isolated namespaces with authentication?
When you deploy individual components, do you deploy one per namespace?  
 How do you handle security and isolation?

Looking forward to your thoughts! 😊

@thesuperzapper
Copy link
Member

First, I want to thank everyone for their passion for Kubeflow. I especially want to thank @juliusvonkohout for his work on kubeflow/manifests that enables many distributions and custom deployments of Kubeflow that make our tools available to end users.

My understanding is that this KEP comes from discussions with your customers that want to deploy and manage a Kubeflow Platform with Helm and ArgoCD.

You make a compelling case why some users want Helm+ArgoCD, however, I'm trying to understand why this deployment method needs to be officially developed under the Kubeflow organization?

Was the alternative of making a Kubeflow Distribution considered?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Mar 17, 2025

First, I want to thank everyone for their passion for Kubeflow. I especially want to thank @juliusvonkohout for his work on kubeflow/manifests that enables many distributions and custom deployments of Kubeflow that make our tools available to end users.

Thank you, the goal of this KEP is to make platform/manifests more attractive for the end users, whether it is a one man private distribution or a large companies public distribution. I think we need to improve at a high technical speed to stay relevant as the platform for orchestrating ML Workflows on Kubernetes.

My understanding is that this KEP comes from discussions with your customers that want to deploy and manage a Kubeflow Platform with Helm and ArgoCD.

You make a compelling case why some users want Helm+ArgoCD, however, I'm trying to understand why this deployment method needs to be officially developed under the Kubeflow organization?

Was the alternative of making a Kubeflow Distribution considered?

Regarding "My understanding is that this KEP comes from discussions with your customers that want to deploy and manage a Kubeflow Platform with Helm and ArgoCD." and "You make a compelling case why some users want Helm+ArgoCD; however, I'm trying to understand why this deployment method needs to be officially developed under the Kubeflow organization?" I have to politely say no. You can see here that I even want to write the opposite:

"""
This platform/manifests charter describes the working mode / reality / status quo of the last 5 years as of March 2025.
It tries to be as lean as possible and balance community and commercial interests.

Scope

  • Enable users / distributions to install, extend and maintain Kubeflow as a multi-tenant platform for multiple users
  • This includes dependencies, security efforts and exemplary integration with popular tools and frameworks.
  • Synchronize the manifests (Helm, Kustomize) between working groups
  • We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...)
  • We do not support a specific deployment tool (e.g., ArgoCD, Flux)
  • The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs
  • We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own
  • There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installation: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ...
  • There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ...

Communication Tasks

With Application Owners

  • Aid the application owner in creating manifests (Helm, Kustomize) for his application
  • Aid the application owner regarding security best practices
  • Communicate with the application owner regarding releases and versioning

With Users / Distribution Owners

  • Distributions are strongly opinionated derivatives of Kubeflow platform/manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ...
  • A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above
  • Coordinate with "distribution owners" / users to take part in the testing of Kubeflow releases.
    ...

"""

"Was the alternative of making a Kubeflow Distribution considered?" I have my personal distribution; I do not need another one. As I said, this is not for me, and I personally do not need Helm, because I mostly build the helm templating functionality myself on top of kustomize, but many end users want Helm for that functionality.

The question is rather, how do we not miss out on the users/companies that require Helm manifests? How do we make it simple to modify, maintain, and extend Kubeflow for full multi-user installations to keep it attractive? How do we consolidate the ten different Helm approaches with synergy to avoid spending ten times the amount of time on maintaining manifests and reinventing the wheel?

How can we get distributions to work more together? How can we get them to upstream some changes so that the platform as a whole progresses and each distribution has reduced maintenance overhead? In the end, we need to have a good product that offers ML as a platform without requiring every adopter to reinvent the wheel. We are failing the ML platform goal if only third-party derivative projects offer essential features like Helm manifests or robust multi-tenancy and enforced security best practices.

I (subjectively) see the multiple single-tenant approach as a lower value offering or only an offering for a special edge case / subset of the user base with significant additional integration efforts for the end user for each separate component compared to the (subjectively more valuable) less integration and maintenance effort multi-tenancy platform way. But no one should be blocked from building individually from scratch and no one should be blocked from experiencing and contributing to an integrated End-to-end platform. Let the people choose how they want to build and contribute. We, or at least I, do not have the time and priority to redesign each individual component with individual authentication and multi-tenancy from scratch as individual mini platform in the short term with a proper upgrade path from our current architecture.
I still want to continue to reduce the complexity long-term step by step, but this will take at least a year. I think we have already reduced the complexity a lot. We are now rather decoupled from the Kubernetes version for example and support a wider range of dependency versions with way better documentation and there are also other lower hanging fruits (new kubernetes-native object storage for example and the security efforts to even be allowed to run Kubeflow by your CISO office) that I would personally prioritize higher. Maybe someone else has different priorities and volunteers to spend his personal time on this per WG topic sooner than myself. However, my estimation is that we will not have much progress in the next 12 months, given the ratio of discussion vs. implementation.
In the end we are volunteers, so I can only expect others to work on what is interesting and a priority for them, not what I think should be interesting / priority for them. If we stop people too hard from working on what is interesting for them, they will just stop contributing. The reason that I offered to mentor this is not just the knowledge of manifests/platform and the testing infrastructure, but keeping Kubeflow as a platform maintainable for end users / distributions. The first installation is only a fraction of the effort while adjusting, templating etc. Is the major effort. Helm with its integrated templating engine could help a lot in this regard.

This is an effort I am pushing for the community, not for myself. But it is getting too broad since this clear Helm manifests discussion becomes tangled up with architecture restructuring discussions as well as other topics. For other topics such as authentication architectural redesigns everyone is free to create a separate non-helm KEP, to keep this one here focused. It also often deteriorates into a broad political governance discussion rather than a constructive search for in-scope (status quo with Helm for GSoC PoC ) solutions. This amount of derailing / defocusing and debating instead of constructive in-scope code / text suggestions and direct improvements is slowing us down and might result in losing platform users and synergies.

@chasecadet, I tried to push for community synergies and Kubeflow platform Helm user adoption, but I can only do so much with a limited amount of time.

Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
@juliusvonkohout
Copy link
Member Author

To have a more focused discussion I have moved the charter into #837.

@juliusvonkohout
Copy link
Member Author

From the long discussion on slack: "Another interesting thing is that more and more manifests come in helm form (spark, istio, etc) and right now we render them out to be usable with kustomize. At some point it could be more feasible to just call them directly from kustomize with the right values, since kustomize has some form of helm integration. I tried it recently to include a helm manifest with parameters via kustomize. This way we could step by step replace the kustomize parts with templatable helm manifests and keep the same ci/cd to make sure that stuff is not breaking and the output stays the same."

See https://github.com/kubernetes-sigs/kustomize/blob/master/examples/chart.md for examples.

@juliusvonkohout
Copy link
Member Author

See for example this example from the proposal from one of the GSOC students https://github.com/akagami-harsh/manifests/blob/helm-charts/kustomize-helm-poc/README.md It uses the trainer helm chart via kustomize @akagami-harsh

@Mohamed-ben-khemis
Copy link

See for example this example from the proposal from one of the GSOC students https://github.com/akagami-harsh/manifests/blob/helm-charts/kustomize-helm-poc/README.md It uses the trainer helm chart via kustomize @akagami-harsh

Thanks for sharing this! And big thanks to @akagami-harsh for the effort 🙌
@juliusvonkohout One potential concern is the added complexity: contributors would need to understand both Helm and Kustomize to make changes or debug issues. As we scale to more Kubeflow components, maintaining consistency across this hybrid model could become increasingly challenging.

@TheCodingSheikh
Copy link

Hey, i made a helm chart to install kubeflow.
Doesnt require modification, helm install will work out of the box, it is based on the manifets repo and argo.
Highly customizable, there is an example to expose with ingress and integrate keycloak.

Check it out and open to feedback

https://github.com/TheCodingSheikh/helm-charts/tree/main/charts/kubeflow

@chasecadet
Copy link
Contributor

Hey, i made a helm chart to install kubeflow. Doesnt require modification, helm install will work out of the box, it is based on the manifets repo and argo. Highly customizable, there is an example to expose with ingress and integrate keycloak.

Check it out and open to feedback

https://github.com/TheCodingSheikh/helm-charts/tree/main/charts/kubeflow

This is awesome! I think what the KSC is working on determining is whether this is a "distribution" or something we move within the main Kubeflow repos. The boundaries of what is core Kubeflow vs. component manifests we support etc.. Right now you can in theory install everything with a single kustomize command, but using best in breed cloud services or other configurations requires effort. Then do you template the patches and how do you potentially tie that into a CI/CD system nicely or just use ArgoCD. This is a great contribution and we eagerly await the KSCs decision so we can figure out the best path forward. Please continue to give feedback here if you have it!

@juliusvonkohout
Copy link
Member Author

From the @kubeflow/kubeflow-steering-committee meeting today:

Voted for by all meeting attending KSC members (4).
"Kubeflow Working Groups (WGs) are allowed, but not required, to provide standalone helm charts for their projects."

image

Not more not less, so do not interpret too much. This is not supposed to answer all questions and possibilities, but to allow us to move forward with the GSOC project and see how far we get.

@chasecadet

@rareddy
Copy link
Contributor

rareddy commented Apr 25, 2025

Good news! Thank you for sharing the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.