|
| 1 | +# Gateway Body-Based Routing (BBR) Demo |
| 2 | + |
| 3 | +This demo deploys **two models** behind a **single Gateway** and validates that |
| 4 | +[Body-Based Routing (BBR)](https://gateway-api-inference-extension.sigs.k8s.io/guides/serving-multiple-inference-pools-latest/) |
| 5 | +correctly routes requests to the right model based on the `"model"` field in the |
| 6 | +JSON request body. |
| 7 | + |
| 8 | +Each `ModelDeployment` uses a **bring-your-own (BYO) HTTPRoute** — meaning you |
| 9 | +create the HTTPRoute yourself and reference it via `spec.gateway.httpRouteRef`. |
| 10 | +This prevents the controller from auto-creating routes and gives you full control |
| 11 | +over routing rules, which is important when multiple models share one gateway. |
| 12 | + |
| 13 | +## Architecture |
| 14 | + |
| 15 | +``` |
| 16 | + ┌───────────────────────────────────────────────────────┐ |
| 17 | + │ Kubernetes (Kind) │ |
| 18 | + │ │ |
| 19 | + ┌────────┐ │ ┌─────────┐ ┌──────────────┐ │ |
| 20 | + │ Client │───────▶│ │ Gateway │────▶│ BBR (parses │ │ |
| 21 | + │ │ │ │ (Istio) │ │ request body│ │ |
| 22 | + └────────┘ │ └─────────┘ │ → sets header│ │ |
| 23 | + │ └──────┬───────┘ │ |
| 24 | + │ │ X-Gateway-Base-Model-Name │ |
| 25 | + │ ┌────────────┴────────────┐ │ |
| 26 | + │ ▼ ▼ │ |
| 27 | + │ ┌──────────────┐ ┌──────────────┐ │ |
| 28 | + │ │ HTTPRoute │ │ HTTPRoute │ │ |
| 29 | + │ │ (model-a) │ │ (model-b) │ │ |
| 30 | + │ └──────┬───────┘ └──────┬───────┘ │ |
| 31 | + │ ▼ ▼ │ |
| 32 | + │ ┌──────────────┐ ┌──────────────┐ │ |
| 33 | + │ │InferencePool │ │InferencePool │ │ |
| 34 | + │ │ + EPP │ │ + EPP │ │ |
| 35 | + │ └──────┬───────┘ └──────┬───────┘ │ |
| 36 | + │ ▼ ▼ │ |
| 37 | + │ ┌──────────────┐ ┌──────────────┐ │ |
| 38 | + │ │ Model A Pod │ │ Model B Pod │ │ |
| 39 | + │ │ (llama3.2 1B)│ │ (gemma2 2B) │ │ |
| 40 | + │ └──────────────┘ └──────────────┘ │ |
| 41 | + └───────────────────────────────────────────────────────┘ |
| 42 | +``` |
| 43 | + |
| 44 | +**Request flow:** Client → Gateway → BBR → HTTPRoute (matched by header) → InferencePool → EPP → Model Pod |
| 45 | + |
| 46 | +## What This Demo Shows |
| 47 | + |
| 48 | +1. **Two ModelDeployments** running behind a single inference Gateway |
| 49 | +2. **BYO HTTPRoutes** — user-managed routes referenced via `spec.gateway.httpRouteRef` |
| 50 | +3. **Body-Based Routing** — BBR parses the `"model"` field from the request body and sets the `X-Gateway-Base-Model-Name` header so the correct HTTPRoute matches |
| 51 | +4. **End-to-end inference** — curl requests with different model names are routed to the correct model |
| 52 | + |
| 53 | +## Prerequisites |
| 54 | + |
| 55 | +- **Docker** — for building images and running Kind |
| 56 | +- **Go 1.25+** — for installing Kind and cloud-provider-kind |
| 57 | +- **kubectl** — Kubernetes CLI |
| 58 | +- **helm** — for installing KAITO and BBR |
| 59 | +- **make** — GNU Make |
| 60 | +- **kustomize** — for deploying the controller/provider |
| 61 | +- **curl** and **jq** — for testing |
| 62 | + |
| 63 | +## Quick Start |
| 64 | + |
| 65 | +```bash |
| 66 | +# From the repo root: |
| 67 | +./demos/gateway-bbr/demo.sh |
| 68 | +``` |
| 69 | + |
| 70 | +The script takes ~15–20 minutes end-to-end (most time is spent waiting for model |
| 71 | +pods to pull images and start serving). |
| 72 | + |
| 73 | +## Configuration |
| 74 | + |
| 75 | +| Environment Variable | Default | Description | |
| 76 | +|---|---|---| |
| 77 | +| `CLUSTER_NAME` | `kubeairunway-bbr-demo` | Kind cluster name | |
| 78 | +| `CONTROLLER_IMG` | `kubeairunway-controller:demo` | Controller image tag | |
| 79 | +| `KAITO_PROVIDER_IMG` | `kaito-provider:demo` | KAITO provider image tag | |
| 80 | +| `SKIP_BUILD` | _(unset)_ | Set to `1` to skip Docker image builds (useful for re-runs) | |
| 81 | +| `CLEANUP_ONLY` | _(unset)_ | Set to `1` to only delete the Kind cluster | |
| 82 | + |
| 83 | +## What Gets Created |
| 84 | + |
| 85 | +| Resource | Name | Description | |
| 86 | +|---|---|---| |
| 87 | +| Kind cluster | `kubeairunway-bbr-demo` | Local Kubernetes cluster | |
| 88 | +| Gateway | `inference-gateway` | Istio-backed inference gateway | |
| 89 | +| ModelDeployment | `model-a` | First model (Llama 3.2 1B via KAITO) | |
| 90 | +| ModelDeployment | `model-b` | Second model (Gemma 2 2B via KAITO) | |
| 91 | +| HTTPRoute | `model-a-route` | BYO route for model-a | |
| 92 | +| HTTPRoute | `model-b-route` | BYO route for model-b | |
| 93 | +| InferencePool | `model-a` | Auto-created by controller | |
| 94 | +| InferencePool | `model-b` | Auto-created by controller | |
| 95 | +| Deployment | `model-a-epp` | Endpoint Picker Proxy for model-a | |
| 96 | +| Deployment | `model-b-epp` | Endpoint Picker Proxy for model-b | |
| 97 | + |
| 98 | +## BYO HTTPRoute Explained |
| 99 | + |
| 100 | +By default, the KubeAIRunway controller auto-creates an HTTPRoute per |
| 101 | +`ModelDeployment`. When two models share one gateway, this can cause route |
| 102 | +conflicts. The **BYO HTTPRoute** pattern solves this: |
| 103 | + |
| 104 | +1. **You create the HTTPRoutes** (see [manifests/httproutes.yaml](manifests/httproutes.yaml)) |
| 105 | +2. **Reference them** in the `ModelDeployment` spec: |
| 106 | + ```yaml |
| 107 | + spec: |
| 108 | + gateway: |
| 109 | + enabled: true |
| 110 | + modelName: "llama-3.2-1b-instruct" |
| 111 | + httpRouteRef: "model-a-route" # ← tells controller to skip auto-creation |
| 112 | + ``` |
| 113 | +3. The controller still creates the `InferencePool` and `EPP`, but skips HTTPRoute |
| 114 | + creation/deletion for that deployment |
| 115 | + |
| 116 | +Each BYO HTTPRoute matches on the `X-Gateway-Base-Model-Name` header (set by BBR) |
| 117 | +so only the correct model's route is matched: |
| 118 | + |
| 119 | +```yaml |
| 120 | +rules: |
| 121 | + - matches: |
| 122 | + - headers: |
| 123 | + - type: Exact |
| 124 | + name: X-Gateway-Base-Model-Name |
| 125 | + value: llama-3.2-1b-instruct # ← BBR sets this from the request body |
| 126 | + backendRefs: |
| 127 | + - group: inference.networking.k8s.io |
| 128 | + kind: InferencePool |
| 129 | + name: model-a |
| 130 | +``` |
| 131 | + |
| 132 | +> **Important:** Each HTTPRoute should match **only** on its specific model's |
| 133 | +> header value. Do NOT add a bare path fallback match (without header) to any |
| 134 | +> route — Istio may evaluate the fallback as a valid match for all requests, |
| 135 | +> causing cross-model misrouting even when a more specific header match exists |
| 136 | +> on another HTTPRoute. |
| 137 | + |
| 138 | +## Cleanup |
| 139 | + |
| 140 | +```bash |
| 141 | +CLEANUP_ONLY=1 ./demos/gateway-bbr/demo.sh |
| 142 | +``` |
| 143 | + |
| 144 | +Or manually: |
| 145 | + |
| 146 | +```bash |
| 147 | +kind delete cluster --name kubeairunway-bbr-demo |
| 148 | +``` |
| 149 | + |
| 150 | +## Troubleshooting |
| 151 | + |
| 152 | +**Models stuck in Pending/Deploying phase:** |
| 153 | +```bash |
| 154 | +kubectl get modeldeployments -o wide |
| 155 | +kubectl get workspaces |
| 156 | +kubectl describe workspace model-a |
| 157 | +kubectl get pods -l kubeairunway.ai/model-deployment |
| 158 | +``` |
| 159 | + |
| 160 | +**Gateway not routing correctly:** |
| 161 | +```bash |
| 162 | +# Check Gateway status |
| 163 | +kubectl get gateway inference-gateway -o yaml |
| 164 | +
|
| 165 | +# Check HTTPRoutes are accepted |
| 166 | +kubectl get httproutes -o wide |
| 167 | +
|
| 168 | +# Check InferencePools |
| 169 | +kubectl get inferencepools -o wide |
| 170 | +
|
| 171 | +# Check EPP logs |
| 172 | +kubectl logs -l app.kubernetes.io/name=model-a-epp |
| 173 | +kubectl logs -l app.kubernetes.io/name=model-b-epp |
| 174 | +
|
| 175 | +# Check BBR logs |
| 176 | +kubectl logs -l app.kubernetes.io/name=body-based-routing |
| 177 | +
|
| 178 | +# Check Istio logs |
| 179 | +kubectl logs -n istio-system -l app=istiod --tail=50 |
| 180 | +``` |
0 commit comments