This directory contains the Helm chart for deploying Semantic Router on Kubernetes.
deploy/helm/
├── MIGRATION.md # Migration guide from Kustomize to Helm
├── validate-chart.sh # Chart validation script
└── semantic-router/ # Helm chart
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration values
├── values-dev.yaml # Development environment values
├── values-prod.yaml # Production environment values
├── README.md # Comprehensive chart documentation
├── .helmignore # Helm ignore patterns
└── templates/ # Kubernetes resource templates
├── _helpers.tpl # Template helpers
├── namespace.yaml # Namespace resource
├── serviceaccount.yaml # Service account
├── configmap.yaml # Configuration
├── pvc.yaml # Persistent volume claim
├── deployment.yaml # Main deployment
├── dashboard-deployment.yaml # Dashboard deployment (optional)
├── dashboard-service.yaml # Dashboard service (optional)
├── service.yaml # Services (gRPC, API, metrics)
├── ingress.yaml # Ingress (optional)
├── hpa.yaml # Horizontal Pod Autoscaler (optional)
└── NOTES.txt # Post-installation notes
- Kubernetes 1.19+
- Helm 3.2.0+
- kubectl configured to access your cluster
The vllm-sr CLI provides a unified deployment experience for both Docker and
Kubernetes. Use the same config.yaml you already use for local Docker
development:
# Deploy to Kubernetes with dev profile
vllm-sr serve --target k8s --profile dev --config config.yaml
# Deploy to a specific namespace and context
vllm-sr serve --target k8s --namespace production --context prod-cluster --profile prod
# Check status
vllm-sr status --target k8s
# Stream router logs
vllm-sr logs router --target k8s -f
# Tear down
vllm-sr stop --target k8sThe CLI translates your config.yaml into Helm values and runs
helm upgrade --install under the hood.
Sensitive environment variables (HF_TOKEN, OPENAI_API_KEY,
ANTHROPIC_API_KEY) are never written as plain-text Helm values.
Instead, the CLI:
- Creates a Kubernetes Secret named
vllm-sr-env-secretscontaining only the sensitive keys. - References the secret via
envFrom/secretRefin the Deployment so the pod receives the values at runtime. - Non-sensitive variables (
HF_ENDPOINT,HF_HOME, etc.) are passed as standardenventries in the Helm values override.
The secret is recreated on every deploy (idempotent) and removed on
vllm-sr stop --target k8s.
To provide credentials, export them before running the CLI:
export HF_TOKEN=hf_xxx
vllm-sr serve --target k8s --profile devIf you deploy with Helm directly (bypassing the CLI), create the secret manually:
kubectl create secret generic vllm-sr-env-secrets \
--namespace vllm-semantic-router-system \
--from-literal=HF_TOKEN=hf_xxxThen reference it in your values file:
envFromSecrets:
- vllm-sr-env-secrets# Using Make (recommended)
make helm-install
# Or with Helm directly
helm install semantic-router ./deploy/helm/semantic-router \
--namespace vllm-semantic-router-system \
--create-namespaceNeed a registry mirror/proxy (e.g., in China)? Append
--set global.imageRegistry=<your-registry>to any Helm install/upgrade command.
# Check Helm release status
make helm-status
# Check pods
kubectl get pods -n vllm-semantic-router-system
# View logs
make helm-logs# Port forward API
make helm-port-forward-api
# Test the API
curl http://localhost:8080/healthFor local development with reduced resources:
make helm-dev
# Or manually:
helm install semantic-router ./deploy/helm/semantic-router \
-f ./deploy/helm/semantic-router/values-dev.yaml \
--namespace vllm-semantic-router-system \
--create-namespaceFeatures:
- Reduced resource requests (1Gi RAM, 500m CPU)
- Smaller storage (5Gi)
- Dashboard enabled
- Observability stack enabled (Jaeger, Prometheus, Grafana)
- Faster probes
For production deployment with high availability:
make helm-prod
# Or manually:
helm install semantic-router ./deploy/helm/semantic-router \
-f ./deploy/helm/semantic-router/values-prod.yaml \
--namespace production \
--create-namespaceFeatures:
- Multiple replicas (2 minimum, auto-scaling to 10)
- High resource allocation (8Gi RAM, 4 CPU)
- Auto-scaling enabled (70% CPU target)
- Security hardening (runAsNonRoot, no privilege escalation)
- Prometheus and Grafana enabled, Jaeger disabled
- Production-grade storage (20Gi)
Create your own values file:
# my-values.yaml
replicaCount: 2
resources:
limits:
memory: "8Gi"
cpu: "2"
config:
providers:
defaults:
default_model: "my-model"
models:
- name: "my-model"
provider_model_id: "my-model"
backend_refs:
- name: "primary"
endpoint: "my-vllm.default.svc.cluster.local:8000"
protocol: "http"
weight: 1
routing:
modelCards:
- name: "my-model"
decisions:
- name: "default-route"
priority: 100
rules:
operator: "AND"
conditions: []
modelRefs:
- model: "my-model"
use_reasoning: false
ingress:
enabled: true
hosts:
- host: semantic-router.mydomain.com
paths:
- path: /
pathType: Prefix
servicePort: 8080Then install:
helm install semantic-router ./deploy/helm/semantic-router \
-f my-values.yaml \
--namespace my-namespace \
--create-namespaceThe project includes convenient Make targets for Helm operations:
make helm-install # Install the chart
make helm-upgrade # Upgrade the release
make helm-uninstall # Uninstall the release
make helm-status # Show release status
make helm-list # List all releasesmake helm-lint # Lint the chart
make helm-template # Template the chart
make helm-dev # Deploy with dev config
make helm-prod # Deploy with prod config
make helm-package # Package the chartmake helm-test # Test the deployment
make helm-logs # Show logs
make helm-values # Show computed values
make helm-manifest # Show deployed manifestmake helm-port-forward-api # Port forward API (8080)
make helm-port-forward-grpc # Port forward gRPC (50051)
make helm-port-forward-metrics # Port forward metrics (9190)make helm-rollback # Rollback to previous version
make helm-history # Show release history
make helm-clean # Complete cleanupmake help-helm # Show Helm helpBefore deploying, validate the Helm chart:
# Run validation script
./deploy/helm/validate-chart.sh
# Or manually:
make helm-lint
make helm-template# Upgrade with new values
helm upgrade semantic-router ./deploy/helm/semantic-router \
-f my-updated-values.yaml \
--namespace vllm-semantic-router-system
# Or using Make:
make helm-upgrade HELM_VALUES_FILE=my-updated-values.yamlIf an upgrade fails:
# Rollback to previous version
make helm-rollback
# Or rollback to specific revision
helm rollback semantic-router 1 --namespace vllm-semantic-router-systemconfig:
providers:
defaults:
default_model: "my-model"
models:
- name: "my-model"
provider_model_id: "my-model"
backend_refs:
- name: "endpoint-1"
endpoint: "10.0.1.10:8000"
protocol: "http"
weight: 2
- name: "endpoint-2"
endpoint: "10.0.1.11:8000"
protocol: "http"
weight: 1
routing:
modelCards:
- name: "my-model"
decisions:
- name: "default-route"
priority: 100
rules:
operator: "AND"
conditions: []
modelRefs:
- model: "my-model"
use_reasoning: falseingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: semantic-router.example.com
paths:
- path: /
pathType: Prefix
servicePort: 8080
tls:
- secretName: semantic-router-tls
hosts:
- semantic-router.example.comautoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALLIf you're currently using the Kustomize deployment, see MIGRATION.md for detailed migration instructions.
# Check events
kubectl describe pod -n vllm-semantic-router-system
# Common causes:
# - Insufficient resources
# - PVC not binding
# - Image pull errors
# Solution: Reduce resources
helm upgrade semantic-router ./deploy/helm/semantic-router \
-f values-dev.yaml \
--namespace vllm-semantic-router-system# Models are downloaded automatically by the router at startup.
# Check router logs for model download progress:
kubectl logs <pod-name> -n vllm-semantic-router-system
# Common causes:
# - HuggingFace rate limits (missing HF_TOKEN)
# - Network issues
# - Insufficient storage
# - OOMKilled (increase memory limits)
# Verify the HF_TOKEN secret exists:
kubectl get secret vllm-sr-env-secrets -n vllm-semantic-router-system
# Verify the pod sees the token (value is masked):
kubectl logs <pod-name> -n vllm-semantic-router-system | grep HF_TOKEN
# Check PVC and storage:
kubectl get pvc -n vllm-semantic-router-systemIf model downloads are throttled, make sure HF_TOKEN is exported before
deploying via the CLI, or that the vllm-sr-env-secrets secret exists
when deploying with Helm directly. See the Credential Handling section
above.
# Check service
kubectl get svc -n vllm-semantic-router-system
# Check endpoints
kubectl get endpoints -n vllm-semantic-router-system
# Test internally
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://semantic-router.vllm-semantic-router-system:8080/health- Use Version Control: Keep your
values.yamlfiles in version control - Environment Separation: Use different namespaces and values files for different environments
- Resource Limits: Always set appropriate resource limits based on your workload
- Monitoring: Enable metrics and set up monitoring
- Security: Use security contexts and network policies
- Backups: Regularly backup your PVC data
- Testing: Test upgrades in dev/staging before production
- name: Deploy with Helm
run: |
helm upgrade --install semantic-router ./deploy/helm/semantic-router \
-f values-prod.yaml \
--namespace production \
--create-namespace \
--wait \
--timeout 10mdeploy:
script:
- helm upgrade --install semantic-router ./deploy/helm/semantic-router
-f values-prod.yaml
--namespace production
--create-namespace
--waitapiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: semantic-router
spec:
project: default
source:
repoURL: https://github.com/vllm-project/semantic-router
targetRevision: main
path: deploy/helm/semantic-router
helm:
valueFiles:
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: production- Chart README - Detailed chart documentation
- Migration Guide - Kustomize to Helm migration
- Project Documentation - Main project documentation
- Helm Documentation - Official Helm docs
For issues and questions:
- GitHub Issues: https://github.com/vllm-project/semantic-router/issues
- Documentation: https://semantic-router.io
- Chart Issues: Tag with
helmlabel