fix: add EPP RBAC for InferenceModelRewrite (required by v0.7.0)#997
Open
fix: add EPP RBAC for InferenceModelRewrite (required by v0.7.0)#997
Conversation
EPP v0.7.0 (built on gateway-api-inference-extension v1.3.0) watches InferenceModelRewrite resources on startup. The inferencepool Helm chart v1.0.1 (shipped with llm-d v0.3.0) does not include this permission, causing the EPP to crash-loop with a forbidden error when its image is patched to v0.7.0. Add a supplemental Role and RoleBinding granting the EPP service account read-only access to inferencemodelrewrites, applied alongside the existing image and ConfigMap patches in deploy/lib/infra_llmd.sh. Made-with: Cursor
Contributor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
Collaborator
Author
|
/benchmark openshift |
Contributor
|
🚀 Benchmark (OpenShift) triggered by |
Contributor
WVA Benchmark Results (OpenShift)Environment
|
The benchmark workflow was using the default llm-d v0.3.0 which deploys inferencepool chart v1.0.1 with the alpha API group (inference.networking.x-k8s.io/v1alpha2). Istio 1.29+ with ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true expects the GA API group (inference.networking.k8s.io/v1) and does not configure ext_proc for the alpha InferencePool, causing Gateway to return HTTP 500. This aligns the benchmark with the e2e workflow which already uses LLM_D_RELEASE=main (inferencepool chart v1.2.1 with GA API support). Made-with: Cursor
Collaborator
Author
|
/benchmark openshift |
Contributor
|
🚀 Benchmark (OpenShift) triggered by |
Contributor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
Contributor
WVA Benchmark Results (OpenShift)Environment
|
The previous attempt set LLM_D_RELEASE in ci-benchmark.yaml, but issue_comment-triggered workflows always use the workflow YAML from the default branch — not the PR branch. The env var never took effect. Change the default in deploy/install.sh instead (v0.3.0 → main), which IS executed from the PR checkout. llm-d main uses inferencepool chart v1.4.0 which creates InferencePool with the GA API group (inference.networking.k8s.io/v1) that Istio 1.29+ expects. The old v0.3.0 default used chart v1.0.1 with the alpha API group (inference.networking.x-k8s.io/v1alpha2), which Istio ignores — causing Gateway HTTP 500 errors. Made-with: Cursor
Collaborator
Author
|
/benchmark openshift |
Contributor
|
🚀 Benchmark (OpenShift) triggered by |
Contributor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
Contributor
WVA Benchmark Results (OpenShift)Environment
|
Contributor
WVA Benchmark Results (OpenShift)Environment
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gateway-api-inference-extension v1.3.0) always watchesInferenceModelRewriteresources on startup viaInferenceModelRewriteReconciler. There is no feature gate to disable this.inferencepoolHelm chart v1.0.1 (shipped withllm-d v0.3.0) does not includeinferencemodelrewritesin the EPP Role, causing the EPP to crash-loop when its image is patched to v0.7.0:get,list,watch) access toinferencemodelrewrites, applied alongside the existing image and ConfigMap patches indeploy/lib/infra_llmd.sh.Test plan
/benchmark openshifton a PR to verify EPP no longer crash-loopsReadystate and Gateway returns HTTP 200Made with Cursor