Skip to content

fix: add EPP RBAC for InferenceModelRewrite (required by v0.7.0)#997

Open
kahilam wants to merge 3 commits intomainfrom
fix/epp-rbac-inferencemodelrewrite
Open

fix: add EPP RBAC for InferenceModelRewrite (required by v0.7.0)#997
kahilam wants to merge 3 commits intomainfrom
fix/epp-rbac-inferencemodelrewrite

Conversation

@kahilam
Copy link
Copy Markdown
Collaborator

@kahilam kahilam commented Apr 13, 2026

Summary

  • EPP v0.7.0 (built on gateway-api-inference-extension v1.3.0) always watches InferenceModelRewrite resources on startup via InferenceModelRewriteReconciler. There is no feature gate to disable this.
  • The inferencepool Helm chart v1.0.1 (shipped with llm-d v0.3.0) does not include inferencemodelrewrites in the EPP Role, causing the EPP to crash-loop when its image is patched to v0.7.0:
    Failed to watch *v1alpha2.InferenceModelRewrite:
    inferencemodelrewrites.inference.networking.x-k8s.io is forbidden:
    User "system:serviceaccount:...:gaie-inference-scheduling-epp"
    cannot list resource "inferencemodelrewrites"
    
  • Adds a supplemental Role + RoleBinding granting the EPP service account read-only (get, list, watch) access to inferencemodelrewrites, applied alongside the existing image and ConfigMap patches in deploy/lib/infra_llmd.sh.
  • The upstream chart v1.3.0 already includes this permission; this backfills it for the older chart version.

Test plan

  • Trigger /benchmark openshift on a PR to verify EPP no longer crash-loops
  • Verify EPP pod reaches Ready state and Gateway returns HTTP 200
  • Confirm benchmark runs end-to-end through Gateway connectivity check

Made with Cursor

EPP v0.7.0 (built on gateway-api-inference-extension v1.3.0) watches
InferenceModelRewrite resources on startup. The inferencepool Helm
chart v1.0.1 (shipped with llm-d v0.3.0) does not include this
permission, causing the EPP to crash-loop with a forbidden error
when its image is patched to v0.7.0.

Add a supplemental Role and RoleBinding granting the EPP service
account read-only access to inferencemodelrewrites, applied alongside
the existing image and ConfigMap patches in deploy/lib/infra_llmd.sh.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 13, 2026

/benchmark openshift

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

@github-actions
Copy link
Copy Markdown
Contributor

WVA Benchmark Results (OpenShift)

Environment
  • Cluster: OpenShift (Real GPUs)
  • Model: unsloth/Meta-Llama-3.1-8B
  • Accelerator: H100
  • Commit: 97bd432
  • Scaler: prometheus-adapter
  • Workflow run

The benchmark workflow was using the default llm-d v0.3.0 which deploys
inferencepool chart v1.0.1 with the alpha API group
(inference.networking.x-k8s.io/v1alpha2). Istio 1.29+ with
ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true expects the GA API group
(inference.networking.k8s.io/v1) and does not configure ext_proc for the
alpha InferencePool, causing Gateway to return HTTP 500.

This aligns the benchmark with the e2e workflow which already uses
LLM_D_RELEASE=main (inferencepool chart v1.2.1 with GA API support).

Made-with: Cursor
@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 13, 2026

/benchmark openshift

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 35 15
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@github-actions
Copy link
Copy Markdown
Contributor

WVA Benchmark Results (OpenShift)

Environment
  • Cluster: OpenShift (Real GPUs)
  • Model: unsloth/Meta-Llama-3.1-8B
  • Accelerator: H100
  • Commit: 08b532e
  • Scaler: prometheus-adapter
  • Workflow run

The previous attempt set LLM_D_RELEASE in ci-benchmark.yaml, but
issue_comment-triggered workflows always use the workflow YAML from the
default branch — not the PR branch. The env var never took effect.

Change the default in deploy/install.sh instead (v0.3.0 → main), which
IS executed from the PR checkout. llm-d main uses inferencepool chart
v1.4.0 which creates InferencePool with the GA API group
(inference.networking.k8s.io/v1) that Istio 1.29+ expects.

The old v0.3.0 default used chart v1.0.1 with the alpha API group
(inference.networking.x-k8s.io/v1alpha2), which Istio ignores —
causing Gateway HTTP 500 errors.

Made-with: Cursor
@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 13, 2026

/benchmark openshift

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 36 14
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@github-actions
Copy link
Copy Markdown
Contributor

WVA Benchmark Results (OpenShift)

Environment
  • Cluster: OpenShift (Real GPUs)
  • Model: unsloth/Meta-Llama-3.1-8B
  • Accelerator: H100
  • Commit: 55be615
  • Scaler: prometheus-adapter
  • Workflow run

@github-actions
Copy link
Copy Markdown
Contributor

WVA Benchmark Results (OpenShift)

Environment
  • Cluster: OpenShift (Real GPUs)
  • Model: Qwen/Qwen3-0.6B
  • Accelerator: H100
  • Commit: 55be615
  • Scaler: prometheus-adapter
  • Workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant