fix: add EPP RBAC for InferenceModelRewrite (required by v0.7.0) by kahilam · Pull Request #997 · llm-d/llm-d-workload-variant-autoscaler

kahilam · 2026-04-13T20:05:40Z

Summary

EPP v0.7.0 (built on gateway-api-inference-extension v1.3.0) always watches InferenceModelRewrite resources on startup via InferenceModelRewriteReconciler. There is no feature gate to disable this.

The inferencepool Helm chart v1.0.1 (shipped with llm-d v0.3.0) does not include inferencemodelrewrites in the EPP Role, causing the EPP to crash-loop when its image is patched to v0.7.0:

Failed to watch *v1alpha2.InferenceModelRewrite:
inferencemodelrewrites.inference.networking.x-k8s.io is forbidden:
User "system:serviceaccount:...:gaie-inference-scheduling-epp"
cannot list resource "inferencemodelrewrites"

Adds a supplemental Role + RoleBinding granting the EPP service account read-only (get, list, watch) access to inferencemodelrewrites, applied alongside the existing image and ConfigMap patches in deploy/lib/infra_llmd.sh.
The upstream chart v1.3.0 already includes this permission; this backfills it for the older chart version.

Test plan

Trigger /benchmark openshift on a PR to verify EPP no longer crash-loops
Verify EPP pod reaches Ready state and Gateway returns HTTP 200
Confirm benchmark runs end-to-end through Gateway connectivity check

Made with Cursor

EPP v0.7.0 (built on gateway-api-inference-extension v1.3.0) watches InferenceModelRewrite resources on startup. The inferencepool Helm chart v1.0.1 (shipped with llm-d v0.3.0) does not include this permission, causing the EPP to crash-loop with a forbidden error when its image is patched to v0.7.0. Add a supplemental Role and RoleBinding granting the EPP service account read-only access to inferencemodelrewrites, applied alongside the existing image and ConfigMap patches in deploy/lib/infra_llmd.sh. Made-with: Cursor

github-actions · 2026-04-13T20:09:42Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

kahilam · 2026-04-13T20:09:50Z

/benchmark openshift

github-actions · 2026-04-13T20:10:01Z

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

github-actions · 2026-04-13T20:23:09Z

WVA Benchmark Results (OpenShift)

Environment

Cluster: OpenShift (Real GPUs)
Model: unsloth/Meta-Llama-3.1-8B
Accelerator: H100
Commit: 97bd432
Scaler: prometheus-adapter
Workflow run

The benchmark workflow was using the default llm-d v0.3.0 which deploys inferencepool chart v1.0.1 with the alpha API group (inference.networking.x-k8s.io/v1alpha2). Istio 1.29+ with ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true expects the GA API group (inference.networking.k8s.io/v1) and does not configure ext_proc for the alpha InferencePool, causing Gateway to return HTTP 500. This aligns the benchmark with the e2e workflow which already uses LLM_D_RELEASE=main (inferencepool chart v1.2.1 with GA API support). Made-with: Cursor

kahilam · 2026-04-13T21:27:31Z

/benchmark openshift

github-actions · 2026-04-13T21:27:42Z

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

github-actions · 2026-04-13T21:31:27Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	35	15

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

github-actions · 2026-04-13T21:47:21Z

WVA Benchmark Results (OpenShift)

Environment

Cluster: OpenShift (Real GPUs)
Model: unsloth/Meta-Llama-3.1-8B
Accelerator: H100
Commit: 08b532e
Scaler: prometheus-adapter
Workflow run

The previous attempt set LLM_D_RELEASE in ci-benchmark.yaml, but issue_comment-triggered workflows always use the workflow YAML from the default branch — not the PR branch. The env var never took effect. Change the default in deploy/install.sh instead (v0.3.0 → main), which IS executed from the PR checkout. llm-d main uses inferencepool chart v1.4.0 which creates InferencePool with the GA API group (inference.networking.k8s.io/v1) that Istio 1.29+ expects. The old v0.3.0 default used chart v1.0.1 with the alpha API group (inference.networking.x-k8s.io/v1alpha2), which Istio ignores — causing Gateway HTTP 500 errors. Made-with: Cursor

kahilam · 2026-04-13T22:02:27Z

/benchmark openshift

github-actions · 2026-04-13T22:02:38Z

🚀 Benchmark (OpenShift) triggered by /benchmark openshift

View the benchmark workflow run

github-actions · 2026-04-13T22:08:48Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	36	14

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

github-actions · 2026-04-13T22:29:05Z

WVA Benchmark Results (OpenShift)

Environment

Cluster: OpenShift (Real GPUs)
Model: unsloth/Meta-Llama-3.1-8B
Accelerator: H100
Commit: 55be615
Scaler: prometheus-adapter
Workflow run

github-actions · 2026-04-13T23:10:28Z

WVA Benchmark Results (OpenShift)

Environment

Cluster: OpenShift (Real GPUs)
Model: Qwen/Qwen3-0.6B
Accelerator: H100
Commit: 55be615
Scaler: prometheus-adapter
Workflow run

Conversation

kahilam commented Apr 13, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 13, 2026

GPU Pre-flight Check ✅

Uh oh!

kahilam commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

WVA Benchmark Results (OpenShift)

Uh oh!

kahilam commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

GPU Pre-flight Check ✅

Uh oh!

github-actions bot commented Apr 13, 2026

WVA Benchmark Results (OpenShift)

Uh oh!

kahilam commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

GPU Pre-flight Check ✅

Uh oh!

github-actions bot commented Apr 13, 2026

WVA Benchmark Results (OpenShift)

Uh oh!

github-actions bot commented Apr 13, 2026

WVA Benchmark Results (OpenShift)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant