Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
180dcf0
docs: add Gateway API Inference Extension integration guide
sozercan Feb 13, 2026
8ee91c3
feat: integrate Gateway API Inference Extension for unified inference…
sozercan Feb 18, 2026
c83ed8e
fix: correct GAIE API group, add EndpointPickerRef, resolve gateway e…
sozercan Feb 19, 2026
56e0433
feat: auto-discover model name from running server for gateway routing
sozercan Feb 19, 2026
ad83deb
docs/test: add model name auto-discovery tests and update docs
sozercan Feb 19, 2026
a3877f9
docs: fix gateway overview link to point to repo instead of GEP
sozercan Feb 19, 2026
cb4f9d9
docs: remove status column from gateway implementations table
sozercan Feb 19, 2026
82f3435
docs: clarify gateway implementations are BYO
sozercan Feb 19, 2026
eaba4f4
docs: move Istio note to setup, remove from troubleshooting
sozercan Feb 19, 2026
f92187d
fix: clean up gateway resources on phase transition and set GatewayRe…
sozercan Feb 19, 2026
026348a
fix: validate gateway flags and add TTL to CRD detection cache
sozercan Feb 19, 2026
7b4807a
docs: show gateway.enabled in deploy example
sozercan Feb 19, 2026
7878e1a
test: add e2e gateway tests with Istio
sozercan Feb 19, 2026
9052735
fix: add retry loop for GatewayReady condition in e2e test
sozercan Feb 19, 2026
9a31449
fix: e2e gateway test - set model.id, test direct inference
sozercan Feb 19, 2026
dd876fe
fix: resolve service port for model name auto-discovery
sozercan Feb 19, 2026
935ca77
fix: add RBAC for services and resolve service port for auto-discovery
sozercan Feb 19, 2026
9e7b6cc
test: add EPP deployment and route traffic through gateway in e2e
sozercan Feb 19, 2026
ec66ce9
feat: auto-deploy EPP alongside InferencePool
sozercan Feb 19, 2026
0a471a3
fix: add pods and leases RBAC for EPP role creation
sozercan Feb 19, 2026
e71b53a
fix: add retry loop for HTTPRoute existence check in e2e
sozercan Feb 19, 2026
79596f8
fix: controller labels model pods for InferencePool selector
sozercan Feb 19, 2026
ffba6bb
fix: add retry loop for InferencePool existence check in e2e
sozercan Feb 19, 2026
50cd84a
fix: add x-k8s.io RBAC for EPP (inferenceobjectives, inferencemodelre…
sozercan Feb 19, 2026
5c53ef0
fix: add x-k8s.io RBAC to controller SA to avoid escalation
sozercan Feb 19, 2026
75e18c8
fix: add Istio DestinationRule for EPP in e2e test
sozercan Feb 19, 2026
146a728
fix: use NodePort for Istio gateway in Kind e2e
sozercan Feb 19, 2026
3c7d5bd
fix: use NodePort service for gateway inference test in Kind
sozercan Feb 19, 2026
d271089
debug: add HTTP status code to gateway inference test output
sozercan Feb 19, 2026
aad04c3
fix: use container target port for InferencePool, not service port
sozercan Feb 19, 2026
4e92b93
fix: correct Istio env var to ENABLE_GATEWAY_API_INFERENCE_EXTENSION
sozercan Feb 19, 2026
4fcbe89
fix: add both required Istio env vars for inference extension
sozercan Feb 19, 2026
c46da52
fix: remove non-existent SUPPORT_ flag, add debug for env var verific…
sozercan Feb 19, 2026
6245ffe
debug: add gateway proxy logs, DestinationRules to debug output
sozercan Feb 20, 2026
9188aff
debug: restart gateway proxy after DestinationRule to pick up config
sozercan Feb 20, 2026
bed76b4
fix: add appProtocol h2c to EPP service for Istio protocol detection
sozercan Feb 20, 2026
b6db0c0
fix: add path match and timeout to HTTPRoute for gateway routing
sozercan Feb 20, 2026
1c2513a
debug: add shadow service, endpoints, and proxy config dump
sozercan Feb 20, 2026
ee8c72f
fix: disable mTLS for EPP and remove gateway restart
sozercan Feb 20, 2026
6336301
fix: add tls.mode DISABLE to DestinationRule for EPP
sozercan Feb 20, 2026
e23a0c1
fix: set mesh-wide PERMISSIVE mTLS for EPP reachability
sozercan Feb 20, 2026
e98f631
debug: parse ext_proc filter config from Envoy config dump
sozercan Feb 20, 2026
2f0e507
fix: YAML syntax error in e2e workflow from Python heredoc
sozercan Feb 20, 2026
77a87fd
fix: disable auto mTLS globally for EPP connectivity
sozercan Feb 20, 2026
ee3098f
fix: inject Istio sidecar into EPP for mTLS with gateway proxy
sozercan Feb 20, 2026
5e621f6
fix: exclude health check port from Istio sidecar interception
sozercan Feb 20, 2026
ce23f31
feat: switch e2e from Istio to Envoy Gateway
sozercan Feb 20, 2026
bb5906d
test: finalize e2e gateway tests with resource verification
sozercan Feb 20, 2026
be7e670
chore: remove dead ResolvedGatewayModelName, unexport defaultLlamaCpp…
sozercan Feb 20, 2026
0766bb7
docs: add Envoy Gateway setup note to gateway.md
sozercan Feb 20, 2026
ebb9daa
fix: per-ModelDeployment EPP names, cleanup EPP on disable, restore l…
sozercan Feb 20, 2026
d881c40
test: enable Envoy Gateway InferencePool support and add traffic rout…
sozercan Feb 20, 2026
c3f5611
fix: use values file for Envoy Gateway helm install
sozercan Feb 20, 2026
3756a28
fix: install Envoy Gateway first, then patch config for InferencePool
sozercan Feb 20, 2026
f767f05
fix: fail if gateway proxy service not found, improve service discovery
sozercan Feb 20, 2026
c84a2d9
fix: use printf for EG values file, add detailed install debugging
sozercan Feb 20, 2026
2adb894
fix: use Envoy Gateway v1.7.0 which supports backendResources
sozercan Feb 20, 2026
e77dfac
fix: install Envoy Gateway without extensionManager config
sozercan Feb 20, 2026
c0b3bef
test: try Envoy Gateway v0.0.0-latest (dev build) for backendResources
sozercan Feb 20, 2026
761d8d8
test: finalize e2e with resource verification, defer traffic routing
sozercan Feb 20, 2026
15ba8be
test: switch e2e to Istio + cloud-provider-kind for LoadBalancer
sozercan Feb 20, 2026
3f4f4b7
fix: enable Istio sidecar injection for EPP mTLS
sozercan Feb 20, 2026
891415b
fix: add includeInboundPorts annotation for EPP sidecar
sozercan Feb 20, 2026
661bdfd
fix: disable auto mTLS and sidecar injection, connect directly to EPP
sozercan Feb 20, 2026
f49a059
fix: use SIMPLE TLS with insecureSkipVerify for EPP DestinationRule
sozercan Feb 20, 2026
cb85bd4
feat: support BYO HTTPRoute via spec.gateway.httpRouteRef
sozercan Feb 20, 2026
8bd3d59
refactor: remove ready bool from GatewayStatus, use conditions only
sozercan Feb 20, 2026
6c8b601
docs: add cross-namespace Gateway setup with ReferenceGrant
sozercan Feb 20, 2026
b37eefa
fix: refresh CRD detection cache on resource creation failure
sozercan Feb 20, 2026
71843a9
test: add isNoMatchError test cases
sozercan Feb 20, 2026
b5f693f
docs: remove port-forwarding mention from gateway overview
sozercan Feb 20, 2026
d61d0ea
chore: pin GAIE to v1.3.1, update Go dependency
sozercan Feb 20, 2026
ab3cc9e
chore: use official EPP image from registry.k8s.io pinned to v1.3.1
sozercan Feb 20, 2026
aab1422
fix: warn when multiple gateways have inference label
sozercan Feb 20, 2026
f1c41e7
docs: clarify BBR is BYO for multi-model setups
sozercan Feb 20, 2026
5ba855e
docs: use registry.k8s.io for BBR chart
sozercan Feb 20, 2026
117a944
docs: add version matching note with go.mod link for BBR chart
sozercan Feb 20, 2026
da05ae2
test: install BBR in e2e for multi-model readiness
sozercan Feb 20, 2026
cdd9350
feat: add X-Gateway-Base-Model-Name header match to HTTPRoute
sozercan Feb 20, 2026
0591604
chore: centralize GAIE version in Makefile and Go constant
sozercan Feb 20, 2026
7924b79
fix: add fallback path-only match for single-model setups
sozercan Feb 20, 2026
6a02408
fix: remove duplicate DeploymentConfig, fix gw.ready, restore aikit t…
sozercan Feb 20, 2026
685e3b8
merge: resolve conflicts with main
sozercan Feb 20, 2026
8423dc8
Add Dynamo provider LoRA adapter support
sozercan Feb 21, 2026
8d736c3
feat: add LoRA adapter support for ModelDeployment CRD
sozercan Feb 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions .github/workflows/e2e-gateway.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
name: E2E Gateway Tests

on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:

jobs:
e2e-gateway:
runs-on: ubuntu-latest-16-cores
timeout-minutes: 45

steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v4

- name: Setup Go
uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5
with:
go-version: "1.25"
cache-dependency-path: controller/go.sum

- name: Setup Kind
run: |
go install sigs.k8s.io/kind@latest
kind create cluster --name kubeairunway-gw-e2e --wait 120s
# Allow workloads on control plane node for LoadBalancer access
kubectl label node kubeairunway-gw-e2e-control-plane node.kubernetes.io/exclude-from-external-load-balancers- 2>/dev/null || true

- name: Install cloud-provider-kind
run: |
go install sigs.k8s.io/cloud-provider-kind@latest
cloud-provider-kind &
sleep 5
echo "✅ cloud-provider-kind running"

- name: Install Gateway API CRDs
run: |
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml

- name: Install Gateway API Inference Extension CRDs
run: |
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.3.1/manifests.yaml

- name: Install Istio with Inference Extension support
run: |
curl -L https://istio.io/downloadIstio | sh -
cd istio-*/bin
./istioctl install --set profile=minimal \
--set values.pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true -y
kubectl wait --for=condition=Available deployment/istiod -n istio-system --timeout=120s
echo "✅ Istio installed"

- name: Install KAITO operator
run: |
helm repo add kaito https://kaito-project.github.io/kaito/charts/kaito
helm install kaito-workspace kaito/workspace \
--namespace kaito-workspace \
--create-namespace \
--set featureGates.disableNodeAutoProvisioning=true
kubectl wait --for=condition=Available deployment -n kaito-workspace -l app.kubernetes.io/name=workspace --timeout=120s

- name: Build and deploy controller
run: |
make controller-docker-build CONTROLLER_IMG=kubeairunway-controller:e2e
kind load docker-image kubeairunway-controller:e2e --name kubeairunway-gw-e2e
make controller-deploy CONTROLLER_IMG=kubeairunway-controller:e2e
kubectl wait --for=condition=Available deployment -n kubeairunway-system -l control-plane=controller-manager --timeout=120s

- name: Build and deploy KAITO provider
run: |
make kaito-provider-docker-build KAITO_PROVIDER_IMG=kaito-provider:e2e
kind load docker-image kaito-provider:e2e --name kubeairunway-gw-e2e
make kaito-provider-deploy KAITO_PROVIDER_IMG=kaito-provider:e2e
kubectl wait --for=condition=Available deployment -n kubeairunway-system -l control-plane=kaito-provider --timeout=120s

- name: Wait for provider registration
run: |
kubectl wait --for=jsonpath='{.status.ready}'=true inferenceproviderconfig/kaito --timeout=120s

- name: Create Gateway resource
run: |
kubectl apply -f controller/test/e2e/testdata/gateway.yaml
echo "Waiting for Gateway to be programmed..."
for i in $(seq 1 30); do
PROGRAMMED=$(kubectl get gateway inference-gateway -o jsonpath='{.status.conditions[?(@.type=="Programmed")].status}' 2>/dev/null || echo "")
if [ "$PROGRAMMED" = "True" ]; then
echo "✅ Gateway is programmed"
break
fi
echo "Attempt $i/30: programmed=$PROGRAMMED"
if [ "$i" = "30" ]; then
echo "⚠️ Gateway not programmed after 30 attempts, continuing anyway (Kind may not support LoadBalancer)"
fi
sleep 5
done

- name: Create ModelDeployment with gateway enabled
run: |
kubectl apply -f controller/test/e2e/testdata/gateway-modeldeployment.yaml

- name: Wait for ModelDeployment to reach Running phase
run: |
kubectl wait --for=condition=WorkspaceSucceeded workspace/llama-gw-e2e -n default --timeout=600s 2>/dev/null || true

echo "Waiting for ModelDeployment to reach Running phase..."
for i in $(seq 1 60); do
PHASE=$(kubectl get modeldeployment llama-gw-e2e -o jsonpath='{.status.phase}' 2>/dev/null || echo "")
echo "Attempt $i/60: phase=$PHASE"
if [ "$PHASE" = "Running" ]; then
echo "✅ ModelDeployment is Running"
exit 0
fi
sleep 10
done
echo "❌ Timed out waiting for ModelDeployment to reach Running phase"
exit 1

- name: Verify InferencePool created
run: |
echo "Waiting for InferencePool..."
for i in $(seq 1 30); do
if kubectl get inferencepool llama-gw-e2e -n default > /dev/null 2>&1; then
echo "✅ InferencePool found"
break
fi
echo "Attempt $i/30: InferencePool not found yet"
if [ "$i" = "30" ]; then
echo "❌ Timed out waiting for InferencePool"
exit 1
fi
sleep 5
done

# Verify selector label
SELECTOR=$(kubectl get inferencepool llama-gw-e2e -n default \
-o jsonpath='{.spec.selector.matchLabels.kubeairunway\.ai/model-deployment}')
if [ "$SELECTOR" != "llama-gw-e2e" ]; then
echo "❌ InferencePool selector mismatch: expected 'llama-gw-e2e', got '$SELECTOR'"
exit 1
fi
echo "✅ InferencePool selector correct"

# Verify endpointPickerRef
EPP_NAME=$(kubectl get inferencepool llama-gw-e2e -n default \
-o jsonpath='{.spec.endpointPickerRef.name}')
if [ -z "$EPP_NAME" ]; then
echo "❌ InferencePool missing endpointPickerRef"
exit 1
fi
echo "✅ InferencePool endpointPickerRef set: $EPP_NAME"

- name: Verify HTTPRoute created
run: |
echo "Waiting for HTTPRoute..."
for i in $(seq 1 30); do
if kubectl get httproute llama-gw-e2e -n default > /dev/null 2>&1; then
echo "✅ HTTPRoute found"
break
fi
echo "Attempt $i/30: HTTPRoute not found yet"
if [ "$i" = "30" ]; then
echo "❌ Timed out waiting for HTTPRoute"
exit 1
fi
sleep 5
done

# Verify parent ref points to gateway
PARENT=$(kubectl get httproute llama-gw-e2e -n default \
-o jsonpath='{.spec.parentRefs[0].name}')
if [ "$PARENT" != "inference-gateway" ]; then
echo "❌ HTTPRoute parent mismatch: expected 'inference-gateway', got '$PARENT'"
exit 1
fi
echo "✅ HTTPRoute parent ref correct"

# Verify backend ref points to InferencePool
BACKEND_GROUP=$(kubectl get httproute llama-gw-e2e -n default \
-o jsonpath='{.spec.rules[0].backendRefs[0].group}')
BACKEND_KIND=$(kubectl get httproute llama-gw-e2e -n default \
-o jsonpath='{.spec.rules[0].backendRefs[0].kind}')
if [ "$BACKEND_GROUP" != "inference.networking.k8s.io" ] || [ "$BACKEND_KIND" != "InferencePool" ]; then
echo "❌ HTTPRoute backend ref mismatch: group=$BACKEND_GROUP kind=$BACKEND_KIND"
exit 1
fi
echo "✅ HTTPRoute backend ref correct"

- name: Verify gateway status and model name auto-discovery
run: |
echo "Waiting for GatewayReady condition..."
for i in $(seq 1 30); do
GW_READY=$(kubectl get modeldeployment llama-gw-e2e -n default \
-o jsonpath='{.status.conditions[?(@.type=="GatewayReady")].status}' 2>/dev/null || echo "")
if [ "$GW_READY" = "True" ]; then
echo "✅ GatewayReady condition is True"
break
fi
echo "Attempt $i/30: GatewayReady=$GW_READY"
if [ "$i" = "30" ]; then
echo "❌ Timed out waiting for GatewayReady condition"
exit 1
fi
sleep 5
done

# Check auto-discovered model name
MODEL_NAME=$(kubectl get modeldeployment llama-gw-e2e -n default \
-o jsonpath='{.status.gateway.modelName}')
if [ -z "$MODEL_NAME" ]; then
echo "❌ Gateway model name is empty"
exit 1
fi
echo "✅ Gateway model name auto-discovered: $MODEL_NAME"

- name: Wait for EPP to be ready
run: |
echo "Waiting for EPP deployment..."
for i in $(seq 1 30); do
READY=$(kubectl get deployment llama-gw-e2e-epp -n default -o jsonpath='{.status.readyReplicas}' 2>/dev/null || echo "0")
if [ "$READY" = "1" ]; then
echo "✅ EPP is ready"
break
fi
echo "Attempt $i/30: EPP readyReplicas=$READY"
if [ "$i" = "30" ]; then
echo "❌ EPP not ready"
exit 1
fi
sleep 10
done

- name: Configure Istio DestinationRule for EPP
run: |
kubectl apply -f - <<'DREOF'
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: llama-gw-e2e-epp
namespace: default
spec:
host: llama-gw-e2e-epp.default.svc.cluster.local
trafficPolicy:
tls:
mode: SIMPLE
insecureSkipVerify: true
DREOF
echo "✅ Istio DestinationRule created for EPP"

- name: Install Body-Based Router (BBR)
run: |
helm install body-based-router \
--set provider.name=istio \
--version v1.3.1 \
oci://registry.k8s.io/gateway-api-inference-extension/charts/body-based-routing \
--wait --timeout 120s
echo "✅ BBR installed"

- name: Test inference through gateway
run: |
MODEL_NAME=$(kubectl get modeldeployment llama-gw-e2e -n default \
-o jsonpath='{.status.gateway.modelName}')
echo "Model name: $MODEL_NAME"

# Get the Gateway LoadBalancer IP (provided by cloud-provider-kind)
GW_IP=""
for i in $(seq 1 30); do
GW_IP=$(kubectl get gateway inference-gateway -o jsonpath='{.status.addresses[0].value}' 2>/dev/null || echo "")
if [ -n "$GW_IP" ]; then
echo "Gateway IP: $GW_IP"
break
fi
echo "Waiting for Gateway IP... attempt $i/30"
sleep 5
done

if [ -z "$GW_IP" ]; then
echo "❌ Gateway IP not assigned"
exit 1
fi

echo "Sending inference request through gateway at http://${GW_IP}..."
for i in $(seq 1 18); do
HTTP_CODE=$(curl -s -o /tmp/response.json -w '%{http_code}' --max-time 30 \
http://${GW_IP}/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"$MODEL_NAME\",
\"messages\": [{\"role\": \"user\", \"content\": \"Say hello in one word.\"}],
\"max_tokens\": 10
}" 2>&1 || true)
RESPONSE=$(cat /tmp/response.json 2>/dev/null || echo "")

if [ "$HTTP_CODE" = "200" ] && echo "$RESPONSE" | jq -e '.choices[0].message.content' > /dev/null 2>&1; then
echo "Response: $RESPONSE"
echo "✅ Inference through gateway succeeded"
exit 0
fi
echo "Attempt $i/18: HTTP=$HTTP_CODE body=$(echo $RESPONSE | head -c 200)"
sleep 10
done
echo "❌ Inference through gateway failed"
exit 1

- name: Test gateway disable and cleanup
run: |
# Disable gateway
kubectl patch modeldeployment llama-gw-e2e -n default \
--type=merge -p '{"spec":{"gateway":{"enabled":false}}}'

echo "Waiting for gateway resources to be cleaned up..."
sleep 15

# Verify InferencePool deleted
if kubectl get inferencepool llama-gw-e2e -n default 2>/dev/null; then
echo "❌ InferencePool should have been deleted"
exit 1
fi
echo "✅ InferencePool cleaned up"

# Verify HTTPRoute deleted
if kubectl get httproute llama-gw-e2e -n default 2>/dev/null; then
echo "❌ HTTPRoute should have been deleted"
exit 1
fi
echo "✅ HTTPRoute cleaned up"

# Verify GatewayReady condition is False
GW_READY=$(kubectl get modeldeployment llama-gw-e2e -n default \
-o jsonpath='{.status.conditions[?(@.type=="GatewayReady")].status}')
if [ "$GW_READY" != "False" ]; then
echo "❌ GatewayReady condition should be False after disable: $GW_READY"
exit 1
fi
echo "✅ GatewayReady condition is False after disable"

- name: Collect debug info
if: failure()
run: |
echo "=== ModelDeployments ==="
kubectl get modeldeployments -A -o yaml
echo "=== InferencePools ==="
kubectl get inferencepools -A -o yaml 2>/dev/null || echo "No InferencePools"
echo "=== HTTPRoutes ==="
kubectl get httproutes -A -o yaml 2>/dev/null || echo "No HTTPRoutes"
echo "=== Gateways ==="
kubectl get gateways -A -o yaml 2>/dev/null || echo "No Gateways"
echo "=== Workspaces ==="
kubectl get workspaces -A -o yaml
echo "=== Controller Logs ==="
kubectl logs -n kubeairunway-system -l control-plane=controller-manager --tail=200
echo "=== KAITO Provider Logs ==="
kubectl logs -n kubeairunway-system -l control-plane=kaito-provider --tail=100
echo "=== EPP Logs ==="
kubectl logs -n default -l app.kubernetes.io/name=llama-gw-e2e-epp --tail=100 2>/dev/null || echo "No EPP logs"
echo "=== Istio Logs ==="
kubectl logs -n istio-system -l app=istiod --tail=100 2>/dev/null || echo "No Istio logs"
echo "=== Gateway Proxy Logs ==="
GW_POD=$(kubectl get pods -n default -l gateway.networking.k8s.io/gateway-name=inference-gateway -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
[ -n "$GW_POD" ] && kubectl logs "$GW_POD" -n default --tail=50 2>/dev/null || echo "No gateway proxy logs"
echo "=== Gateway Pods ==="
kubectl get pods -n default -l gateway.networking.k8s.io/gateway-name=inference-gateway -o yaml
echo "=== Events ==="
kubectl get events -A --sort-by=.lastTimestamp
echo "=== Pods ==="
kubectl get pods -A

- name: Cleanup
if: always()
run: |
kind delete cluster --name kubeairunway-gw-e2e
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
# Controller image
CONTROLLER_IMG ?= ghcr.io/kaito-project/kubeairunway-controller:latest

# Gateway API Inference Extension version
GAIE_VERSION ?= v1.3.1

# Provider images
KAITO_PROVIDER_IMG ?= ghcr.io/kaito-project/kaito-provider:latest
DYNAMO_PROVIDER_IMG ?= ghcr.io/kaito-project/dynamo-provider:latest
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ KubeAIRunway gives you a web UI and a unified Kubernetes CRD (`ModelDeployment`)
- 🔧 **Multiple Engines** — [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [llama.cpp](https://github.com/ggml-org/llama.cpp)
- 📈 **Live Monitoring** — Real-time status, logs, and Prometheus metrics
- 💰 **Cost Estimation** — GPU pricing and capacity guidance
- 🌐 **Gateway API Integration** — Unified inference endpoint via [Gateway API Inference Extension](https://gateway-api.sigs.k8s.io/geps/gep-3567/) with auto-detected setup
- 🔌 **Headlamp Plugin** — Full-featured [Headlamp](https://headlamp.dev/) dashboard plugin

## Supported Providers
Expand Down Expand Up @@ -97,6 +98,7 @@ The controller automatically selects the best engine and provider, creates provi
| Observability | [docs/observability.md](docs/observability.md) |
| Development | [docs/development.md](docs/development.md) |
| Kubernetes Deployment | [deploy/kubernetes/README.md](deploy/kubernetes/README.md) |
| Gateway Integration | [docs/gateway.md](docs/gateway.md) |
| Headlamp Plugin | [docs/headlamp-plugin.md](docs/headlamp-plugin.md) |

## Contributing
Expand Down
Loading