Skip to content

Latest commit

 

History

History
60 lines (45 loc) · 1.5 KB

File metadata and controls

60 lines (45 loc) · 1.5 KB

Multi-Model Routing

Route requests to different AI models based on the x-ai-eg-model header.

Prerequisites

Deploy common infrastructure first:

cd ../
kubectl apply -f gateway.yaml

Deploy

kubectl apply -f multi-model-routing/ai-gateway-route.yaml

Test

python3 multi-model-routing/client.py

Manual Testing

# Grab the Gateway URL
export GATEWAY_URL=`kubectl get gateway ai-gateway -o jsonpath={.status.addresses[0].value}`

# Test gpt-oss-20b
curl -X POST http://$GATEWAY_URL/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-ai-eg-model: openai/gpt-oss-20b" \
  -d '{"model": "openai/gpt-oss-20b", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'

curl -X POST http://$GATEWAY_URL/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-ai-eg-model: Qwen/Qwen3-1.7B" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Troubleshooting

  • 404 errors: Check if ai-gateway-route.yaml is applied
  • Model not responding: Verify backend services are running
  • Model invoke timing out: Verify if envoy default AI gateway deployment has scaled up and running at least 1 pod. If not, restart the envoy AI gateway deployment.
kubectl rollout restart deployment ai-gateway-controller -n envoy-ai-gateway-system
# Check route status
kubectl get aigatewayroute multi-model-route -o yaml

# Check backend services
kubectl get svc | grep -E "(qwen|gpt-oss)"