-
Notifications
You must be signed in to change notification settings - Fork 234
Adjust the platform/manifests charter according to the last 5 years and the state of 2025 #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…nd the state of 2025 Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @juliusvonkohout.
I will take a look at these changes later this week.
/hold for WGs to review
cc @kubeflow/wg-training-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-manifests-leads @kubeflow/wg-pipeline-leads @kubeflow/wg-data-leads @kubeflow/wg-automl-leads @kubeflow/kubeflow-steering-committee @kubeflow/wg-deployment-leads @kubeflow/release-team
It is also interesting that part of this is even in the CNCF graduation criteria, see #834 (comment) " |
#832 (comment) also provides a lot of context and argumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments.
However I am quite concerned that we don't explicitly define what the actual code products of this new working group are. The scope has expanded over the years due to a lack of clarity on this issue.
If we're going to rewrite the charter, we should consider scoping it very explicitly to only aggregating the application manifests (as originally intended).
All other tasks listed might be delegated to other groups, or even downstream distributions, as this ensures the community is more focused on creating the AI/ML tools, which actually make up Kubeflow.
- Enable users / distributions to install, extend and maintain Kubeflow as a multi-tenant platform for multiple users | ||
- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. | ||
- Synchronize the manifests (Helm, Kustomize) between working groups | ||
- We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary for the manifest working group?
We intentionally excluded this goal from the original manifest wg charter to prevent unnecessary focus on vendor-specific issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice the users are going to have different Kubernetes layers (Kind, Rancher, AKS, EKS, GKE, ...) but this only covers Kubernetes, not AWS managed databases or so. We definitely try to be compatible with the most popular ones although we cannot guarantee it. Right now it works on Kind, Rancher, AKS, EKS, GKE for me and this is also what most users expect. So it is a "soft goal" we try for our users, but we do not guarantee it.
In the end this is done by volunteers, that is what we want to work on. This is where we see the value in contributing to Kubeflow. If someone else wants to focus on something else he is free to do that what is sustainable and valuable for him. No one is forced to work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the kubeflow/manifests
were only meant to be a "minimum viable deployment" for testing purposes on Kind clusters?
Should we say that instead?
wg-manifests/charter.md
Outdated
- Distributions are strongly opinionated derivatives of Kubeflow platform/manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... | ||
- A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the current definition of a distribution, there is no requirement for it to be "strongly opinionated".
If we are to include a definition here, we must agree as a community.
It might be better to just define it somewhere else rather than in this charter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I can remove the word "strongly" but i think that is a good definition and right where it belongs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of where we define the term "Kubeflow Distribution" is defined, changing it will require a wider community discussion, because we already have a definition in the original KEP.
https://github.com/kubeflow/community/blob/master/proposals/434-kubeflow-distribution/README.md
|
||
### With Application Owners | ||
|
||
- Aid the application owner in creating manifests (Helm, Kustomize) for his application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that requiring the manifests WG to support the upstream manifests is sustainable.
But obviously, it is something that the individuals who are participating might also choose to do if they are so inclined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You always have to keep in mind that we are volunteers. All of this is best-effort. We try it. Sometimes other working groups need help to understand for example securitycontexts of a pod, since they are rather focused on the source code. Or we help them to fix the kustomize 5 warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that many contributors are volunteers, but either way, the WG charters are governance documents.
It's important for the health of the WG (and project) that we set reasonable expectations for the working group members.
I am not sure it's sustainable to include the expectation of upstream manifest maintenance, this is why the original charter focused only on "aggregating manifests".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over the last five years it was very sustainable and we made great progress.
wg-manifests/charter.md
Outdated
- The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs | ||
- We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand how this relates to the core goal of the manifest working group, which is to enable the creation of distributions.
As a lot of these things seem kind of distribution-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on " A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above" it covers the full userbase. It can be a private 1 man distribution or a large enterprise inhouse distribution. It can even be a public distribution or one that is for sale. There are probably several thousands.
So we want to restrict it via "- The default installation shall not contain deep integration with external cloud services or closed source solutions, instead we aim for Kubernetes-native solutions and light authentication and authorization integration with external IDPs"
but not block people from exchanging non-default ideas and examples for basic needs "- We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own" It does not mean that we support or help with such examples, it just means "hey you can connect Kubeflow to your IDP with low effort. Here is the dex and oauth2 architecture and documentation, feel free to try it out on your own or check out our get support page if you need help"
This is done by volunteers, that is what we want to work on. This is where we see the value in contributing to Kubeflow. If someone else wants to focus on something else he is free to do that what is sustainable and valuable for him.
As a opinionated distribution derived from Kubeflow/manifests you can cater exactly to your userbase and make such company specific things supported and enabled by default, but that is not what we are doing here.
"If we're going to rewrite the charter, we should consider scoping it very explicitly to only aggregating the application manifests (as originally intended)." Is rather useless and can be done by a robot. Right now or at least for the last 5 years our focus is 95+ % platform while the manifests make up less than 5 % of the work. Just copying manifests can be done by a robot and does not need a WG on its own. This is not what this WG is about. We as WG focus for 5 years on platform and that is what we want to work on. This is where we see the value in contributing to Kubeflow. If someone else wants to focus on something else he is free to do that what is sustainable and valuable for him. In the end this is done by volunteers, that is what we want to work on. This is where we see the value in contributing to Kubeflow. If someone else wants to focus on something else he is free to do that what is sustainable and valuable for him. No one is forced to work on that and no one should be blocked from working on that. |
Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
Signed-off-by: Julius von Kohout <[email protected]>
Alright I updated it to satisfy the comments and the old structure. I also hope to have clarified what is owned and maintained by the WG. "In that case, do we need to completely remove concepts of "Kubeflow Distribution"?" We could, but then users would just find a new word for derivatives from platfrom/manifests. I think the definition of derivative fits quite well with how the term distribution is commonly used also outside of Kubeflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some more comments.
as Subproject Owners, Tech Leads and Chairs. This is done to ensure we have a | ||
simple enough model to start that people can understand and get used to. So for | ||
the Manifests WG we only have Manifests WG Leads, which are the root approvers. | ||
|
||
The following sections will aim to define the requirements for someone to become | ||
a reviewer and an approver in the root OWNERS file (Manifests WG Lead). | ||
|
||
### Manifests WG Lead Requirements | ||
### Platform/Manifests WG Lead Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep the name of the WG the same, as discussed above.
1. Being involved with the release team, since the [release process](https://github.com/kubeflow/community/tree/master/releases) is tightly intertwined with the manifests/platform repository | ||
2. Testing methodologies (GitHub Actions) | ||
3. Processes regarding the [experimental](https://github.com/kubeflow/manifests/blob/master/experimental) components | ||
4. [Platform manifests](https://github.com/kubeflow/manifests/tree/master/common) maintained irectly by Manifests/Platform WG (Istio, Knative, Cert Manager etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo, and also lets keep the name as Manifests WG
.
5. Community and health of the project | ||
|
||
Root approvers, or Manifests WG Leads, are expected to have expertise and be able | ||
Root approvers, or Manifests/Platform WG Leads, are expected to have expertise and be able |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep the name as Manifests WG
.
We simply (automatically) synchronize the application and dependencies manifests to then elaborately combine (configure)them for full platform experience. | ||
Providing a consistent and tested end-to-end multi-tenant experience is the most important task of the platform/manifests WG. | ||
To achieve this we maintain an extensive testing suite that covers most basic scenarios users would expect from a Platform for ML orchestration. | ||
We also provide the documentation regarding, but not limited to installation, extension, security and architecture to enable users to run their own ML Platform on Kubernetes. | ||
Users may choose to derive from platform/manifests to create so called distributions, which are opinionated to satisfy individual requirements. | ||
Users may also choose to install individual components without the benefits of the platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we format this as a list of bullet points, and combine the ones which are the the same idea?
This makes it easier to have discussions about each specific element of the scope, as some new elements are being proposed.
- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. | ||
- Users can also install individual components without the benefits of the platform, but then they could also just directly fetch them from the WG releases. | ||
- Synchronize the manifests between working groups and make sure via integration tests that the components work end-to-end together as multi-tenant platform | ||
- Release tested releases of the Kubeflow platform for downstream consumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify what is meant by "Kubeflow Platform", because this is not defined, or just not use that term.
- Distributions are opinionated derivatives of Kubeflow platform/manifests, for example replacing all databases with closed source managed databases from AWS, GKE, Azure, ... | ||
- A distribution can be created by an arbitrary amount of users / companies in private or in public by deriving from Kubeflow platform/manifests, see the definition above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to define "distribution" here, lets be as generic as possible:
- distributions are all downstream derivatives of the kubeflow manifests which are not maintained by the kubeflow community
We could also define it, and other terms at the top of the document.
- Maintain a catalog that will allow users to install Kubeflow apps and | ||
common services easily on Kubernetes, either on the cloud or on-prem, without | ||
depending on external cloud services or closed source solutions. Those | ||
manifests are deployed using `kubectl` and `kustomize` and include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to ensure that kustomize
/kubectl
remains the primary "scope" of the manifests repo, as this is what all existing distributions/users are based on.
Also, are we allowing other deployment tools beyond kubectl
and kustomize
(e.g. helm
, argo cd
, flux cd
), because this is a big scope change if so?
- Enable users / distributions to install, extend and maintain Kubeflow as a end-to-end multi-tenant platform for multiple users | ||
- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. | ||
- Users can also install individual components without the benefits of the platform, but then they could also just directly fetch them from the WG releases. | ||
- Synchronize the manifests between working groups and make sure via integration tests that the components work end-to-end together as multi-tenant platform | ||
- Release tested releases of the Kubeflow platform for downstream consumption | ||
- We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) | ||
- We provide hints and experimental examples how a user / distribution could integrate non-default external authentication (e.g. companies Identity Provider) and popular non-default services on his own | ||
- We in general document the installation of Kubeflow as a platform and / or individual components including common problems and architectural overviews. | ||
- There is the evolving and not exhaustive list of dependencies for a proper multi-tenant platform installation: Istio, KNative, Dex, Oauth2-proxy, Cert-Manager, ... | ||
- There is the evolving and not exhaustive list of applications: KFP, Trainer, Dashboard, Workspaces / Noteboks, Kserve, Spark, ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try and list these as a specific list of "responsibilities" (like the current ones).
Words like "enable", "hints", and "experimental examples" are not very clear.
- Enable users / distributions to install, extend and maintain Kubeflow as a multi-tenant platform for multiple users | ||
- This includes dependencies, security efforts and exemplary integration with popular tools and frameworks. | ||
- Synchronize the manifests (Helm, Kustomize) between working groups | ||
- We try to be compatible with the popular Kubernetes clusters (Kind, Rancher, AKS, EKS, GKE, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the kubeflow/manifests
were only meant to be a "minimum viable deployment" for testing purposes on Kind clusters?
Should we say that instead?
- Decide which applications to include in Kubeflow. | ||
- Decide which variant of an application to include (e.g., KFP Standalone vs | ||
KFP with Istio). | ||
- Create and maintain one or more Kubeflow distributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Opening a new thread because the old one was marked as resolved)
It is critical that we still explicitly exclude the creation of a distribution by the manifest working group, as this would create a massive conflict of interest.
Thanks for taking the time to update this @juliusvonkohout! I'm taking a detailed look as well. With a diagonal look, it seems to me we have these 2 discussions going on in parallel:
IMO we first need to discuss if Manifests WG should be Platform WG, and include multi tenancy as well as what processes to have around such a WG that can have decisions that affect other WGs. Will add more comments in the proposal as well, towards that direction |
@kimwnasptd for lgtm since these changes are usually and historically done by the manifests WG leads (@kimwnasptd and @juliusvonkohout)
CC @kubeflow/kubeflow-steering-committee
#832 (comment) and #834 (comment) also provides a lot of context and argumentation.
"
WG Manifests Charter
This charter adheres to the conventions, roles and organization management
outlined in [wg-governance].
Scope
We simply (automatically) synchronize the application and dependencies manifests to then elaborately combine (configure)them for full platform experience.
Providing a consistent and tested end-to-end multi-tenant experience is the most important task of the platform/manifests WG.
To achieve this we maintain an extensive testing suite that covers most basic scenarios users would expect from a Platform for ML orchestration.
We also provide the documentation regarding, but not limited to installation, extension, security and architecture to enable users to run their own ML Platform on Kubernetes.
Users may choose to derive from platform/manifests to create so called distributions, which are opinionated to satisfy individual requirements.
Users may also choose to install individual components without the benefits of the platform.
In scope
Code, Binaries and Services
Cross-cutting and Externally Facing Processes
With Application Owners
With Users / Distribution Owners
Out of scope
...
"
This diagram might be outdated and from 2023 (oauth2-proxy, model-registry, spark etc is missing), but it captures a lot of essentials which are also mandatory parts of the kubeflow/notebooks and kubeflow/dashboard repository, just imagine 2-2000 Users (statistics Canada recently reported around 1000 users) and for more details I can provide hours of Kubecon architecture presentations with slides. The grey box is just one user.
