Skip to content

feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented May 13, 2022

Fixes #4649

Due to newly discovered functionality in argo we will not use multiple buckets but just multiple folders and the following iam policy. Furthermore e.g. AWS has a limit of 100 buckets so we should definitely stick with 1 bucket. This is a follow up of #7406

By the way I am open to change the folder structure etc. as you prefer.

One bucket multiple folder namespace isolation

  1. change the workflow-controller configmap by adding workflow.namespace to keyFormat: "private-artifacts/{{workflow.namespace}}/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
    We do not need the per namespace artifactrepositories configmaps anymore, but users can still use them to change the default artifact storage defined above.

  2. We still need to create a minio user in sync.py and and save the individual credentials to the per namespace secret. But we now need only one policy although this policy must still be attached to every user. So just some small changes in sync.py. We do not need to create additional buckets anymore

"Resource": [
    "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
    "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
    "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
    "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration
]

We exclude "arn:aws:s3:::%s/pipelines/*" % shared_bucket_name, # shared pipelines because the the apiserver, not the user should manage pipeline definitions on minio. Hopefully @zijianjoy will remove them entirely from MinIO soon.
So changing the environment variable

pipelinePath = "MINIO_PIPELINE_PATH"
and splitting the storage into shared and private pipelines becomes unnecessary.

  1. We can revert to the upstream images for apiserver, persistenceagent etc. Only ml-pipeline-ui changes from WIP namespace isolation for artifacts and pipeline definitions #7406 must be kept for managing namespaced pipeline definitions in the UI, but this can be done with another PR. The pipeline-ui from the PR 7406 does not need to be changed since it uses the apiserver properly according to gabors changes in frontend/src/pages/PipelineList.tsx . The proposal from arrikto or feat(frontend, sdk): towards namespaced pipelines. Part of #4197 #7447 could also be used.

  2. i removed the DEPRECATED (https://www.kubeflow.org/docs/components/pipelines/sdk/python-based-visualizations/) visualizationserver (ml-pipeline-visualizationserver). Please use the supported stuff and "viewers" https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/

  3. I added a network policy that complements https://github.com/kubeflow/manifests/tree/master/contrib/networkpolicies

  4. I added a fix for [bug] pod is forbidden: failed quota: kf-resource-quota: must specify cpu,memory #7699

  5. I added the pod default such that on jupyterlab creation one can enable easy authentication to KFP according to https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode

  6. I added a kubernetes NetworkAttachmentDefinition for istio-cni. This will be necessary to run Kubeflow completly rootless (as i am already doing with istio-cni and without NET_ADMIN and NET_RAW capabilities) on Openshift https://istio.io/latest/docs/setup/platform-setup/openshift/#additional-requirements-for-the-application-namespace . Vanilla Kubernetes might not need this, but it does no harm.

  7. There is also a security issue in the UI server because i can read other peoples artifacts just by removing =namespace=xxx from the end of the artifact URL. Also there an attacker has to guess the right file names.

  8. The lifecycle policy that i have added works wonderfully with the minio storage backend and proper cache settings (feat(frontend): caching may now be disabled when starting pipeline runs #8177) to delete old cache artifacts after 30 days.

Checklist:

@google-oss-prow google-oss-prow bot added size/L and removed size/XS labels May 13, 2022
@juliusvonkohout juliusvonkohout changed the title WIP Seperate artifacts per namespace/profile/user using only one bucket Seperate artifacts per namespace/profile/user using only one bucket May 14, 2022
@juliusvonkohout
Copy link
Member Author

@chensun @zijianjoy @jacobmalmberg @ca-scribner i finally have something that is worth merging for namespace isolation. It is also quite easy to test on your own instances, because i only changed 4 configuration files.

@juliusvonkohout
Copy link
Member Author

/assign @zijianjoy

@juliusvonkohout juliusvonkohout changed the title Seperate artifacts per namespace/profile/user using only one bucket feat(backend): seperate artifacts per namespace/profile/user using only one bucket May 14, 2022
@juliusvonkohout
Copy link
Member Author

@chensun can you provide some feedback? I will happily implement any changes if needed.

@AndersBennedsgaard
Copy link

Yes, I am part of the new Slack channels. We will most likely not adopt to using Apache Ozone in our project since we already use MinIO in other projects, so I will probably not be able to contribute with much in that regard. But I might take a look at some of the other stuff if I get approval on my end.

@AndersBennedsgaard
Copy link

@juliusvonkohout I've created #10905 to discuss which components can be removed, and how. I would like to get some feedback on this before I create a PR for it.
I've also created #10904 and #10903 to discuss some of my other points from above

@juliusvonkohout
Copy link
Member Author

/reopen
/lifecycle frozen
for the time being

@google-oss-prow google-oss-prow bot reopened this Sep 4, 2024
Copy link

@juliusvonkohout: Reopened this PR.

In response to this:

/reopen
/lifecycle frozen
for the time being

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@juliusvonkohout: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/reopen
/lifecycle frozen
for the time being

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from zijianjoy. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout
Copy link
Member Author

/ok-to-test

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Dec 11, 2024

/reopen

until seaweedfs is merged, but the CI/CD will likely break

@google-oss-prow google-oss-prow bot reopened this Dec 11, 2024
Copy link

@juliusvonkohout: Reopened this PR.

In response to this:

/reopen

until seaweedfs is merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Multi User] Support separate artifact repository for each namespace
7 participants