Skip to content
This repository was archived by the owner on Oct 15, 2025. It is now read-only.
This repository was archived by the owner on Oct 15, 2025. It is now read-only.

Even when specifying different release names for the helm chart, some cluster-wide objects prevent deployment #331

@maugustosilva

Description

@maugustosilva

Component

I don't know

Describe the bug

Single cluster, same model two different users, two different helm chart release names. The second attempted deployment ends in failure, due to the way the ClusterRoleBinding for "endpoint-picker" is named.

Steps to reproduce

Here is attempted deployment with llm-d-benchmark (which uses the llm-d-deployer, in particular llmd-installer.sh) on a cluster where the same model (``llama-3-2-3b-instruct`) was already deployed.

==> Thu Jun 19 09:43:33 EDT 2025 - ./setup/standup.sh - 🚀 Calling llm-d-deployer with options "--skip-infra --release llm-d-marcio --namespace llmdmarcio --storage-class nfs-client-pokprod
 --storage-size 300Gi --values-file /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/auto-standupXXX.4ZIs34KabZ/setup/yamls/07_deployer_values.yaml --context /var/folders/1v/c_4rpq6176xb3bq
hbvtwbx6r0000gn/T/auto-standupXXX.4ZIs34KabZ/environment/context.ctx"...
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
ℹ️  📦 Creating namespace llmdmarcio...        r
namespace/llmdmarcio configured
Context "admin" modified.
✅ Namespace ready
ℹ️  🔹 Using merged values: /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/tmp.Xbs2rfHOym
ℹ️  🔐 Creating/updating HF token secret...
secret "llm-d-hf-token" deleted
secret/llm-d-hf-token created
✅ HF token secret created
ℹ️  Fetching OCP proxy UID...
✅ Derived PROXY_UID=1000870001
ℹ️  📜 Applying modelservice CRD...            r
customresourcedefinition.apiextensions.k8s.io/modelservices.llm-d.ai unchanged
✅ ModelService CRD applied
ℹ️  ⏭️ Model download to PVC skipped: `--download-model` arg not set, assuming PVC model-pvc exists and contains model at path: `models/meta-llama/Llama-3.2-3B-Instruct`.
"bitnami" already exists with the same configuration, skipping
ℹ️  🛠️ Building Helm chart dependencies...     na
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-subdir-external-provisioner" chart repository
...Successfully got an update from the "autopilot" chart repository
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "rh-ecosystem-edge" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "mobb" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 2 charts
Downloading common from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Pulled: registry-1.docker.io/bitnamicharts/redis:20.13.4
Digest: sha256:6a389e13237e8e639ec0d445e785aa246b57bfce711b087033a196a291d5c8d7
Deleting outdated charts
✅ Dependencies built
ℹ️  🔍 Checking for ServiceMonitor CRD (monitoring.coreos.com)...
✅ ServiceMonitor CRD (monitoring.coreos.com/v1) found
ℹ️  🔍 Checking OpenShift user workload monitoring configuration...
✅ ✅ OpenShift user workload monitoring is properly configured
ℹ️  Using OpenShift's built-in monitoring stack
ℹ️  Metrics collection disabled by user request
ℹ️  🚚 Deploying llm-d chart with /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/tmp.Xbs2rfHOym...
Release "llm-d-marcio" does not exist. Installing it now.
Error: Unable to continue with install: ClusterRoleBinding "meta-llama-llama-3-2-3b-instruct-endpoint-picker" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "llm-d-marcio": current value is "nam-release"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "llmdmarcio": current value is "nam-test"
ERROR while executing command "cd /tmp/llm-d-deployer/quickstart; export HF_TOKEN=***; ./llmd-installer.sh --skip-infra --release llm-d-marcio --namespace llmdmarcio --storage-class nfs-client-pokprod --storage-size 300Gi --values-file /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/auto-standupXXX.4ZIs34KabZ/setup/yamls/07_deployer_values.yaml --context /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/auto-standupXXX.4ZIs34KabZ/environment/context.ctx"

cat: /var/folders/1v/c_4rpq6176xb3bqhbvtwbx6r0000gn/T/autostandupXXX.4ZIs34KabZ/setup/commands/1750340613759310000_stderr.log: No such file or directory
msilva@marcios-ibm-mbp llm-d-benchmark % ./setup/standup.sh -c ocp_H100_deployer_llama-17b
❌ Scenario file "/Users/msilva/repos/llm-d/llm-d-benchmark/scenarios/ocp_H100_deployer_llama-17b.sh" could not be found.
msilva@marcios-ibm-mbp llm-d-benchmark % ./setup/teardown.sh -c ocp_H100_deployer_llama-3b -d
WARNING: environment variable LLMDBENCH_CLUSTER_URL=auto. Will attempt to use current context "admin".

Additional context or screenshots

So, if we rename the ClusterRoleBinding for the "endpoint-picker" to take into account the release name, we should be able to avoid the error

Error: Unable to continue with install: ClusterRoleBinding "meta-llama-llama-3-2-3b-instruct-endpoint-picker" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "llm-d-marcio": current value is "nam-release"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "llmdmarcio": current value is "nam-test"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions