Skip to content

Commit b06c1b7

Browse files
authored
Add Gateway BBR demo with two models and BYO HTTPRoutes (#102)
1 parent 8f805f0 commit b06c1b7

6 files changed

Lines changed: 798 additions & 0 deletions

File tree

demos/gateway-bbr/README.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Gateway Body-Based Routing (BBR) Demo
2+
3+
This demo deploys **two models** behind a **single Gateway** and validates that
4+
[Body-Based Routing (BBR)](https://gateway-api-inference-extension.sigs.k8s.io/guides/serving-multiple-inference-pools-latest/)
5+
correctly routes requests to the right model based on the `"model"` field in the
6+
JSON request body.
7+
8+
Each `ModelDeployment` uses a **bring-your-own (BYO) HTTPRoute** — meaning you
9+
create the HTTPRoute yourself and reference it via `spec.gateway.httpRouteRef`.
10+
This prevents the controller from auto-creating routes and gives you full control
11+
over routing rules, which is important when multiple models share one gateway.
12+
13+
## Architecture
14+
15+
```
16+
┌───────────────────────────────────────────────────────┐
17+
│ Kubernetes (Kind) │
18+
│ │
19+
┌────────┐ │ ┌─────────┐ ┌──────────────┐ │
20+
│ Client │───────▶│ │ Gateway │────▶│ BBR (parses │ │
21+
│ │ │ │ (Istio) │ │ request body│ │
22+
└────────┘ │ └─────────┘ │ → sets header│ │
23+
│ └──────┬───────┘ │
24+
│ │ X-Gateway-Base-Model-Name │
25+
│ ┌────────────┴────────────┐ │
26+
│ ▼ ▼ │
27+
│ ┌──────────────┐ ┌──────────────┐ │
28+
│ │ HTTPRoute │ │ HTTPRoute │ │
29+
│ │ (model-a) │ │ (model-b) │ │
30+
│ └──────┬───────┘ └──────┬───────┘ │
31+
│ ▼ ▼ │
32+
│ ┌──────────────┐ ┌──────────────┐ │
33+
│ │InferencePool │ │InferencePool │ │
34+
│ │ + EPP │ │ + EPP │ │
35+
│ └──────┬───────┘ └──────┬───────┘ │
36+
│ ▼ ▼ │
37+
│ ┌──────────────┐ ┌──────────────┐ │
38+
│ │ Model A Pod │ │ Model B Pod │ │
39+
│ │ (llama3.2 1B)│ │ (gemma2 2B) │ │
40+
│ └──────────────┘ └──────────────┘ │
41+
└───────────────────────────────────────────────────────┘
42+
```
43+
44+
**Request flow:** Client → Gateway → BBR → HTTPRoute (matched by header) → InferencePool → EPP → Model Pod
45+
46+
## What This Demo Shows
47+
48+
1. **Two ModelDeployments** running behind a single inference Gateway
49+
2. **BYO HTTPRoutes** — user-managed routes referenced via `spec.gateway.httpRouteRef`
50+
3. **Body-Based Routing** — BBR parses the `"model"` field from the request body and sets the `X-Gateway-Base-Model-Name` header so the correct HTTPRoute matches
51+
4. **End-to-end inference** — curl requests with different model names are routed to the correct model
52+
53+
## Prerequisites
54+
55+
- **Docker** — for building images and running Kind
56+
- **Go 1.25+** — for installing Kind and cloud-provider-kind
57+
- **kubectl** — Kubernetes CLI
58+
- **helm** — for installing KAITO and BBR
59+
- **make** — GNU Make
60+
- **kustomize** — for deploying the controller/provider
61+
- **curl** and **jq** — for testing
62+
63+
## Quick Start
64+
65+
```bash
66+
# From the repo root:
67+
./demos/gateway-bbr/demo.sh
68+
```
69+
70+
The script takes ~15–20 minutes end-to-end (most time is spent waiting for model
71+
pods to pull images and start serving).
72+
73+
## Configuration
74+
75+
| Environment Variable | Default | Description |
76+
|---|---|---|
77+
| `CLUSTER_NAME` | `kubeairunway-bbr-demo` | Kind cluster name |
78+
| `CONTROLLER_IMG` | `kubeairunway-controller:demo` | Controller image tag |
79+
| `KAITO_PROVIDER_IMG` | `kaito-provider:demo` | KAITO provider image tag |
80+
| `SKIP_BUILD` | _(unset)_ | Set to `1` to skip Docker image builds (useful for re-runs) |
81+
| `CLEANUP_ONLY` | _(unset)_ | Set to `1` to only delete the Kind cluster |
82+
83+
## What Gets Created
84+
85+
| Resource | Name | Description |
86+
|---|---|---|
87+
| Kind cluster | `kubeairunway-bbr-demo` | Local Kubernetes cluster |
88+
| Gateway | `inference-gateway` | Istio-backed inference gateway |
89+
| ModelDeployment | `model-a` | First model (Llama 3.2 1B via KAITO) |
90+
| ModelDeployment | `model-b` | Second model (Gemma 2 2B via KAITO) |
91+
| HTTPRoute | `model-a-route` | BYO route for model-a |
92+
| HTTPRoute | `model-b-route` | BYO route for model-b |
93+
| InferencePool | `model-a` | Auto-created by controller |
94+
| InferencePool | `model-b` | Auto-created by controller |
95+
| Deployment | `model-a-epp` | Endpoint Picker Proxy for model-a |
96+
| Deployment | `model-b-epp` | Endpoint Picker Proxy for model-b |
97+
98+
## BYO HTTPRoute Explained
99+
100+
By default, the KubeAIRunway controller auto-creates an HTTPRoute per
101+
`ModelDeployment`. When two models share one gateway, this can cause route
102+
conflicts. The **BYO HTTPRoute** pattern solves this:
103+
104+
1. **You create the HTTPRoutes** (see [manifests/httproutes.yaml](manifests/httproutes.yaml))
105+
2. **Reference them** in the `ModelDeployment` spec:
106+
```yaml
107+
spec:
108+
gateway:
109+
enabled: true
110+
modelName: "llama-3.2-1b-instruct"
111+
httpRouteRef: "model-a-route" # ← tells controller to skip auto-creation
112+
```
113+
3. The controller still creates the `InferencePool` and `EPP`, but skips HTTPRoute
114+
creation/deletion for that deployment
115+
116+
Each BYO HTTPRoute matches on the `X-Gateway-Base-Model-Name` header (set by BBR)
117+
so only the correct model's route is matched:
118+
119+
```yaml
120+
rules:
121+
- matches:
122+
- headers:
123+
- type: Exact
124+
name: X-Gateway-Base-Model-Name
125+
value: llama-3.2-1b-instruct # ← BBR sets this from the request body
126+
backendRefs:
127+
- group: inference.networking.k8s.io
128+
kind: InferencePool
129+
name: model-a
130+
```
131+
132+
> **Important:** Each HTTPRoute should match **only** on its specific model's
133+
> header value. Do NOT add a bare path fallback match (without header) to any
134+
> route — Istio may evaluate the fallback as a valid match for all requests,
135+
> causing cross-model misrouting even when a more specific header match exists
136+
> on another HTTPRoute.
137+
138+
## Cleanup
139+
140+
```bash
141+
CLEANUP_ONLY=1 ./demos/gateway-bbr/demo.sh
142+
```
143+
144+
Or manually:
145+
146+
```bash
147+
kind delete cluster --name kubeairunway-bbr-demo
148+
```
149+
150+
## Troubleshooting
151+
152+
**Models stuck in Pending/Deploying phase:**
153+
```bash
154+
kubectl get modeldeployments -o wide
155+
kubectl get workspaces
156+
kubectl describe workspace model-a
157+
kubectl get pods -l kubeairunway.ai/model-deployment
158+
```
159+
160+
**Gateway not routing correctly:**
161+
```bash
162+
# Check Gateway status
163+
kubectl get gateway inference-gateway -o yaml
164+
165+
# Check HTTPRoutes are accepted
166+
kubectl get httproutes -o wide
167+
168+
# Check InferencePools
169+
kubectl get inferencepools -o wide
170+
171+
# Check EPP logs
172+
kubectl logs -l app.kubernetes.io/name=model-a-epp
173+
kubectl logs -l app.kubernetes.io/name=model-b-epp
174+
175+
# Check BBR logs
176+
kubectl logs -l app.kubernetes.io/name=body-based-routing
177+
178+
# Check Istio logs
179+
kubectl logs -n istio-system -l app=istiod --tail=50
180+
```

0 commit comments

Comments
 (0)