Skip to content

Commit 6c1b4d3

Browse files
authored
Replace model-not-found Service with Envoy direct_response EnvoyFilter (#60)
- charts/modelharness: add envoyfilter-not-found.yaml that patches a catch-all 404 (OpenAI-compatible JSON) onto each per-namespace Gateway via Envoy direct_response. - charts/modelharness: delete httproute-not-found.yaml and referencegrant.yaml; remove modelNotFound block from values.yaml. - hack/e2e/scripts/install-components.sh: drop install_model_not_found function, its phase1-base invocation, and the unused MANIFESTS_DIR. - hack/e2e/scripts/validate-components.sh: drop the cluster-shared model-not-found Pod readiness check. - test/e2e/utils: remove ModelNotFoundNamespace/PodLabel constants and the unused HTTPRouteGVK/ReferenceGrantGVK; refresh setup.go and helm.go doc comments. - test/e2e/model_routing_test.go: rewrite 'Model-specific route wins over catch-all' to assert HTTP 200 + matching response body model (the old nginx access-log probe no longer applies); drop unused countNginxAccessLogs helper and kubernetes import. - test/e2e/gpu_mocker_test.go: refresh comments for the unknown-model 404 spec to describe the EnvoyFilter direct_response. - docs: update test/e2e/README.md and production-stack-E2E-test-scenarios.md to reflect the new design.
1 parent f2c9d9b commit 6c1b4d3

16 files changed

Lines changed: 126 additions & 254 deletions

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ namespaces and are shared by every model deployment:
5454
| BBR (Body-Based Router) | `istio-system` | `BBR_VERSION` (v1.3.1) | helm | Installed in Istio's rootNamespace so its EnvoyFilter applies cluster-wide; injects `X-Gateway-Model-Name`. |
5555
| `llm-gateway-auth` ([`kaito-project/llm-gateway-auth`](https://github.com/kaito-project/llm-gateway-auth)) | `llm-gateway-auth` | `LLM_GATEWAY_AUTH_VERSION` | helm | API-key ext_authz for the `inference-gateway`. Installs the `APIKey` CRD, the `apikey-operator` (reconciles `APIKey` → per-namespace Secret), and the `apikey-authz` ext_authz dataplane wired into Istio via `MeshConfig` + `AuthorizationPolicy`. |
5656
| KEDA + KEDA Kaito Scaler ([`kaito-project/keda-kaito-scaler`](https://github.com/kaito-project/keda-kaito-scaler), optional) | `keda` | `KEDA_VERSION` (v2.19.0), `KEDA_KAITO_SCALER_VERSION` (v0.4.1) | helm | Workload-metric autoscaling. |
57-
| `model-not-found` (Deployment + ConfigMap + Service) | `default` | repo `HEAD` ([`hack/e2e/manifests/model-not-found.yaml`](hack/e2e/manifests/model-not-found.yaml)) | kubectl | Cluster-shared nginx-backed Service that returns OpenAI-compatible `404 model_not_found` JSON. Referenced cross-namespace by every workload namespace's catch-all `HTTPRoute` (authorised via a `ReferenceGrant` rendered by `charts/modelharness`). |
5857

5958
### Step 2. modelharness (one-time per workload namespace)
6059

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
{{/*
2+
Catch-all "model not found" responder, implemented as an Envoy
3+
direct_response on the per-namespace Gateway. Replaces the previous
4+
HTTPRoute → cluster-shared `model-not-found` Service design.
5+
6+
Why an EnvoyFilter direct_response instead of a backed HTTPRoute:
7+
- Zero backend Pod / Service / cross-namespace ReferenceGrant.
8+
- Response body is generated by Envoy itself (no extra hop).
9+
10+
Why a catch-all is REQUIRED (and not just a UX nicety):
11+
Istio's CUSTOM AuthorizationPolicy is implemented as a paired
12+
`envoy.filters.http.rbac` (shadow) + `envoy.filters.http.ext_authz`
13+
filter — ext_authz is gated on metadata that the RBAC shadow filter
14+
writes during decodeHeaders. When Envoy's router fails to match any
15+
HTTPRoute it returns a local 404 BEFORE the RBAC shadow has finished
16+
evaluating + writing that metadata, which means ext_authz is never
17+
invoked and unknown-model requests SILENTLY BYPASS API-key auth.
18+
Keeping a catch-all route that always matches preserves the full
19+
filter-chain run and ensures auth runs on every request, regardless
20+
of model name. Removing this template re-opens that bypass.
21+
22+
The patch is anchored to BBR's filter name as a `subFilter` so it
23+
attaches to the same HCM that `install_bbr` injects BBR into. The
24+
`workloadSelector` scopes it to this namespace's Gateway pod only.
25+
*/}}
26+
apiVersion: networking.istio.io/v1alpha3
27+
kind: EnvoyFilter
28+
metadata:
29+
name: model-not-found-direct
30+
namespace: {{ include "modelharness.namespace" . }}
31+
labels:
32+
{{- include "modelharness.labels" . | nindent 4 }}
33+
spec:
34+
workloadSelector:
35+
labels:
36+
gateway.networking.k8s.io/gateway-name: {{ include "modelharness.gatewayName" . | quote }}
37+
configPatches:
38+
- applyTo: VIRTUAL_HOST
39+
match:
40+
context: GATEWAY
41+
routeConfiguration:
42+
vhost:
43+
name: ""
44+
patch:
45+
operation: MERGE
46+
value:
47+
routes:
48+
# Appended last; deployment-specific HTTPRoute matches on
49+
# X-Gateway-Model-Name win first, this rule catches the rest.
50+
- name: model-not-found-fallback
51+
match:
52+
prefix: /
53+
direct_response:
54+
status: 404
55+
body:
56+
inline_string: |
57+
{"error":{"message":"The model does not exist.","type":"invalid_request_error","param":"model","code":"model_not_found"}}
58+
response_headers_to_add:
59+
- header:
60+
key: content-type
61+
value: application/json
62+
append_action: OVERWRITE_IF_EXISTS_OR_ADD

charts/modelharness/templates/httproute-not-found.yaml

Lines changed: 0 additions & 34 deletions
This file was deleted.

charts/modelharness/templates/referencegrant.yaml

Lines changed: 0 additions & 24 deletions
This file was deleted.

charts/modelharness/values.yaml

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,13 @@ gatewayName: ""
1919
# gatewayPort is the HTTP listener port on the Gateway.
2020
gatewayPort: 80
2121

22-
# modelNotFound configures the cross-namespace reference to the
23-
# cluster-shared model-not-found Service that the catch-all HTTPRoute
24-
# forwards unmatched requests to. The Service itself is installed once
25-
# per cluster in `modelNotFound.namespace` (typically `default`) by the
26-
# E2E install script — this chart only renders the catch-all HTTPRoute
27-
# and the ReferenceGrant authorising the cross-namespace backendRef.
28-
modelNotFound:
29-
namespace: "default"
30-
serviceName: "model-not-found"
22+
# Catch-all "model not found" responses are now produced by an Envoy
23+
# direct_response patched onto the Gateway's HCM via the
24+
# `model-not-found-direct` EnvoyFilter (see
25+
# templates/envoyfilter-not-found.yaml). No backend Pod / Service /
26+
# ReferenceGrant is required, so the previous `modelNotFound` config
27+
# (which pointed at a cluster-shared `default/model-not-found` Service)
28+
# has been removed.
3129

3230
# auth toggles the per-namespace API-key authentication artifacts. When
3331
# enabled, the chart renders:

hack/e2e/manifests/model-not-found.yaml

Lines changed: 0 additions & 58 deletions
This file was deleted.

hack/e2e/scripts/install-components.sh

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,11 @@
2121
# CRD is not yet served, so kubelet retries
2222
# until KAITO finishes installing it)
2323
# - BBR chart prefetch (git clone fork repo only)
24-
# - Cluster-shared model-not-found Service in `default` (consumed by
25-
# every workload namespace's catch-all HTTPRoute via a
26-
# ReferenceGrant rendered by charts/modelharness).
24+
#
25+
# (Catch-all 404 handling is now provided by an EnvoyFilter
26+
# direct_response rendered per-namespace by charts/modelharness — no
27+
# cluster-shared Service is required, so install_model_not_found has
28+
# been removed from this script.)
2729
#
2830
# Phase 2 (parallel, depends on Phase 1):
2931
# - Istio (after Gateway API CRDs)
@@ -52,7 +54,6 @@
5254
set -euo pipefail
5355

5456
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
55-
MANIFESTS_DIR="${SCRIPT_DIR}/../manifests"
5657

5758
# Validate required version variables are set.
5859
: "${ISTIO_VERSION:?ISTIO_VERSION is not set. Source versions.env or export it before calling this script.}"
@@ -265,8 +266,19 @@ install_gateway_api_crds() {
265266
}
266267

267268
install_gwie_crds() {
269+
# Use server-side apply (--server-side --force-conflicts) instead of the
270+
# default client-side apply. install_gwie_crds runs in parallel with
271+
# install_kaito in phase1-base, and the KAITO chart bundles the same
272+
# GWIE CRDs (inferencepools / inferenceobjectives in both
273+
# inference.networking.k8s.io and inference.networking.x-k8s.io groups).
274+
# Client-side apply does GET → CREATE-if-missing, which races with KAITO
275+
# creating the CRD between the GET and the CREATE and fails with
276+
# `AlreadyExists`. Server-side apply is a single atomic POST with a
277+
# field manager: if the object already exists it is merged in place
278+
# (with --force-conflicts taking ownership of any fields KAITO set).
268279
echo "=== Installing GWIE CRDs ==="
269-
kubectl apply -f "https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml"
280+
kubectl apply --server-side --force-conflicts \
281+
-f "https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml"
270282
}
271283

272284
install_keda() {
@@ -430,18 +442,6 @@ install_llm_gateway_auth() {
430442
kubectl -n llm-gateway-auth rollout status deployment/apikey-authz --timeout=180s || true
431443
}
432444

433-
install_model_not_found() {
434-
# Cluster-shared catch-all 404 Service in `default`. Every workload
435-
# namespace's modelharness release renders a catch-all HTTPRoute that
436-
# forwards unmatched requests to this Service across namespaces,
437-
# authorised by a per-namespace ReferenceGrant.
438-
echo "=== Deploying cluster-shared model-not-found Service in default ==="
439-
kubectl apply -f "${MANIFESTS_DIR}/model-not-found.yaml"
440-
441-
echo "⏳ Waiting for model-not-found service..."
442-
kubectl -n default rollout status deployment/model-not-found --timeout=120s || true
443-
}
444-
445445
# ── Phased execution ──────────────────────────────────────────────────────
446446
#
447447
# Per-namespace shared resources (Gateway, catch-all HTTPRoute,
@@ -456,8 +456,7 @@ run_phase phase1-base \
456456
install_keda \
457457
install_keda_kaito_scaler \
458458
install_gpu_mocker \
459-
prefetch_bbr_chart \
460-
install_model_not_found
459+
prefetch_bbr_chart
461460

462461
run_phase phase2-istio \
463462
install_istio

hack/e2e/scripts/validate-components.sh

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -79,22 +79,9 @@ fi
7979
kubectl -n istio-system get pods -l app=body-based-router 2>/dev/null || true
8080
echo ""
8181

82-
# ── Cluster-shared model-not-found backend ──────────────────────────────
83-
# After the modelharness refactor, per-namespace Istio Gateways
84-
# ("<namespace>-gw") are provisioned at test time by EnsureNamespace
85-
# (charts/modelharness), so no `inference-gateway` Gateway pod exists in
86-
# `default` to validate at install time. The only namespace-tier
87-
# component install-components.sh still pre-installs is the
88-
# cluster-shared 404 Service that every workload namespace's catch-all
89-
# HTTPRoute references via a ReferenceGrant — validate that here.
90-
echo "=== model-not-found (cluster-shared 404 backend) ==="
91-
if kubectl -n default wait --for=condition=ready pod -l app=model-not-found --timeout="${TIMEOUT}" >/dev/null 2>&1; then
92-
pass "model-not-found pod is Running"
93-
else
94-
fail "model-not-found pod is NOT Running"
95-
fi
96-
kubectl -n default get pods -l app=model-not-found 2>/dev/null || true
97-
echo ""
82+
# (Catch-all 404 handling is now produced by an EnvoyFilter
83+
# direct_response rendered per-namespace by charts/modelharness — no
84+
# cluster-shared Service exists to validate.)
9885

9986
# ── KEDA ─────────────────────────────────────────────────────────────────
10087
echo "=== KEDA (namespace: ${KEDA_NAMESPACE}, provider: ${E2E_PROVIDER}) ==="

test/e2e/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Single source of truth: [`cases.go`](cases.go) → `CaseDeployments`. Each entry
1515

1616
`Name` is unique cluster-wide and is the value matched by `X-Gateway-Model-Name` (i.e. the `model` field clients send in OpenAI-compatible requests). `Model` is the KAITO preset only — multiple deployments may share a preset under different `Name`s.
1717

18-
Inference tests target the case's **`caseGatewayURL`**. Each case namespace gets its own Gateway, catch-all `model-not-found` route, and (when enabled) API-key auth artifacts via the [`charts/modelharness`](../../charts/modelharness) chart installed by `EnsureNamespace`.
18+
Inference tests target the case's **`caseGatewayURL`**. Each case namespace gets its own Gateway, catch-all `model-not-found-direct` EnvoyFilter (Envoy `direct_response` 404), and (when enabled) API-key auth artifacts via the [`charts/modelharness`](../../charts/modelharness) chart installed by `EnsureNamespace`.
1919

2020
## Helpers
2121

@@ -159,7 +159,7 @@ var GinkgoLabelMyFeature = ginkgo.Label("MyFeature")
159159

160160
### 5. Add per-namespace resources (rare)
161161

162-
If your case needs additional cluster-side resources beyond what the [`charts/modelharness`](../../charts/modelharness) chart already provisions (Gateway, catch-all `model-not-found` Service + HTTPRoute, optional `AuthorizationPolicy` + `APIKey`), add them as templates in `charts/modelharness` so every workload namespace picks them up consistently.
162+
If your case needs additional cluster-side resources beyond what the [`charts/modelharness`](../../charts/modelharness) chart already provisions (Gateway, catch-all `model-not-found-direct` EnvoyFilter, optional `AuthorizationPolicy` + `APIKey`), add them as templates in `charts/modelharness` so every workload namespace picks them up consistently.
163163

164164
### 6. Validate
165165

test/e2e/gpu_mocker_test.go

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -330,14 +330,14 @@ var _ = Describe("GPU Mocker E2E", Ordered, func() {
330330

331331
Context("Non-existent model request", func() {
332332
It("should return 404 with an OpenAI-compatible error for an unknown model", func() {
333-
// The catch-all model-not-found HTTPRoute is provisioned
334-
// per-namespace by the modelharness chart (installed via
335-
// EnsureNamespace) and forwards unmatched requests across
336-
// namespaces to the cluster-shared `default/model-not-found`
337-
// Service (authorised by a ReferenceGrant). The gpu-mocker
338-
// case has AuthAPIKeyEnabled=false, so no
339-
// AuthorizationPolicy is rendered and the probe needs no
340-
// bearer token.
333+
// The catch-all `model-not-found-direct` EnvoyFilter is
334+
// provisioned per-namespace by the modelharness chart
335+
// (installed via EnsureNamespace) and patches an Envoy
336+
// `direct_response` (status 404 + OpenAI-compatible JSON) onto
337+
// the Gateway's virtual host as a catch-all route. No backend
338+
// Pod / Service is involved. The gpu-mocker case has
339+
// AuthAPIKeyEnabled=false, so no AuthorizationPolicy is
340+
// rendered and the probe needs no bearer token.
341341
resp, err := utils.SendChatCompletion(caseGatewayURL, "non-existent-model-xyz")
342342
Expect(err).NotTo(HaveOccurred())
343343
Expect(resp.StatusCode).To(Equal(http.StatusNotFound))

0 commit comments

Comments
 (0)