Umbrella Helm chart that installs all cluster-level components and
CRDs required by the Kaito
production stack. It is designed to be consumed as a Helm
dependency of the top-level kaito chart so a single
helm install kaito ... pulls in every shared piece of infrastructure
the inference data plane relies on.
| Subchart | Purpose | Toggle | Default install namespace |
|---|---|---|---|
body-based-routing |
Cluster-wide singleton Body-Based Router (BBR) ext_proc workload (Deployment + Service + cluster RBAC). The Istio EnvoyFilter that injects BBR into each Gateway is rendered per-namespace by charts/modelharness, not here. |
always installed (mandatory) | kaito-system (umbrella release namespace) |
keda-kaito-scaler |
KEDA external scaler that aggregates vLLM / InferenceSet metrics for workload-aware autoscaling. |
always installed (mandatory) | keda |
llm-gateway-apikey (upstream OCI dep, 0.0.11-alpha) |
API-key auth control plane for the inference Gateway: installs the APIKey CRD, the apikey-operator (reconciles APIKey CRs into per-namespace llm-api-key Secrets), and the apikey-authz ext_authz gRPC dataplane (single cluster-wide Service). ext_authz wiring on inference Gateway pods is rendered per-namespace by charts/modelharness — not by this subchart. The upstream chart's own cluster-wide EnvoyFilter is neutralised via a sentinel gatewaySelector so per-namespace EnvoyFilters in modelharness are the sole source of ext_authz wiring. The upstream chart's STRICT PeerAuthentication is disabled by default (llm-gateway-apikey.istio.strictMTLS: false) to avoid blocking control-plane probes during rollout on mixed-mTLS clusters; flip it back on once the cluster is fully meshed. |
llm-gateway-apikey.enabled (default true) |
llm-gateway-auth |
The three subcharts each expose a namespaceOverride value pinning
all of their namespaced resources to a specific namespace,
independent of the Helm release namespace passed via --namespace.
The first two are in-tree forks; the third (llm-gateway-apikey) is
pulled as a Helm dependency directly from
oci://mcr.microsoft.com/aks/kaito/helm and has supported
namespaceOverride since upstream chart version 0.0.8-alpha. See
Per-subchart install namespace for
details.
Additional cluster-level components will be added as new subcharts
under charts/ with a matching <name>.enabled toggle.
charts/productionstack/
├── Chart.yaml # umbrella chart metadata + dependencies
├── values.yaml # top-level toggles + subchart value pass-through
├── templates/
│ └── NOTES.txt # post-install summary (no resources of its own)
└── charts/
├── body-based-routing/ # BBR ext_proc workload only (in-tree fork)
├── keda-kaito-scaler/ # KEDA external scaler (in-tree fork)
└── llm-gateway-apikey-*.tgz # vendored at `helm dependency update` time
# from oci://mcr.microsoft.com/aks/kaito/helm
The umbrella chart itself ships no Kubernetes resources — every manifest is rendered by a subchart. This keeps the umbrella thin and avoids accidental coupling between components.
# keda must already exist (Helm only auto-creates the release
# namespace).
kubectl create namespace keda --dry-run=client -o yaml | kubectl apply -f -
# Vendor the upstream llm-gateway-apikey tarball into ./charts/ from
# the OCI registry declared in Chart.yaml. Required once after a fresh
# clone (and after bumping the dependency version).
helm dependency update charts/productionstack
helm upgrade --install productionstack charts/productionstack \
--namespace kaito-system \
--create-namespace \
--waitWith the default values BBR lands in kaito-system (the release
namespace), the KEDA scaler in keda, and the upstream
llm-gateway-apikey control plane in llm-gateway-auth (the
namespaceOverride defaults). The umbrella release itself can live in
any namespace.
productionstack only installs the control plane for API-key auth
(APIKey CRD + apikey-operator + apikey-authz gRPC Service). The
actual EnvoyFilter that splices envoy.filters.http.ext_authz into
a Gateway pod is rendered per workload namespace by
charts/modelharness under
templates/envoyfilter-ext-authz.yaml, gated by auth.enabled: true.
Each modelharness release attaches ext_authz to its own <namespace>-gw
Gateway pod and points it at the cluster-wide apikey-authz Service
installed by this umbrella. To enforce auth in a workload namespace:
# my-modelharness-values.yaml
auth:
enabled: trueThe 0.0.11-alpha bump of the upstream llm-gateway-apikey chart
replaced its historic MeshConfig.extensionProviders + CUSTOM
AuthorizationPolicy flow (which required a post-install hook Job to
mutate the cluster-shared istio ConfigMap — impossible on the AKS
Istio add-on) with a templated cluster-wide EnvoyFilter. Because
auth scope is per workload namespace, we deliberately neutralise
that cluster-wide EnvoyFilter (sentinel gatewaySelector that no
Gateway pod carries, namespace pinned to the LGA control plane) and
own the wiring in modelharness instead.
Helm itself only accepts a single --namespace per release. To
install each subchart into its own namespace inside a single
helm install, every subchart in this umbrella exposes a
namespaceOverride value. When set, all of that subchart's
namespaced resources are rendered into the override namespace
instead of the release namespace; cluster-scoped resources
(ClusterRole, ClusterRoleBinding, CustomResourceDefinition,
ValidatingWebhookConfiguration) are unaffected.
The upstream llm-gateway-apikey dependency added the same
namespaceOverride knob in chart version 0.0.8-alpha; below that
version every namespaced resource it ships always rendered into the
umbrella's release namespace.
# my-values.yaml
body-based-routing:
namespaceOverride: "" # inherit release namespace (kaito-system)
keda-kaito-scaler:
namespaceOverride: keda
llm-gateway-apikey:
namespaceOverride: llm-gateway-authCaveats:
- Target namespaces must already exist.
helm install --create-namespaceonly creates the release namespace, not the override targets. - The Helm release metadata Secret is stored in the release
namespace, but
helm uninstallstill cleans up resources in every override namespace because Helm tracks the rendered manifest. - Set
namespaceOverride: ""to opt out and inherit the release namespace (standard Helm behavior).
body-based-routing and keda-kaito-scaler are mandatory cluster
infrastructure and are always installed. Optional components can be
disabled via their enabled toggle, e.g. to skip the API-key auth
stack:
helm upgrade --install productionstack charts/productionstack \
--namespace kaito-system \
--set llm-gateway-apikey.enabled=falseNest the override under the subchart's name (which matches the
subchart's Chart.yaml name: field). For example, to run two BBR
replicas and bump the keda-kaito-scaler log level:
# my-values.yaml
body-based-routing:
enabled: true
bbr:
replicas: 2
keda-kaito-scaler:
enabled: true
log:
level: 3
# Pass-through to the upstream OCI dependency. Key MUST match the
# dependency `name:` in Chart.yaml (`llm-gateway-apikey`).
llm-gateway-apikey:
enabled: true
authz:
replicaCount: 3helm upgrade --install productionstack charts/productionstack \
--namespace kaito-system \
-f my-values.yamlSee each subchart's own README / values.yaml for the full set of supported keys:
charts/body-based-routing/README.mdcharts/keda-kaito-scaler/values.yaml- Upstream
llm-gateway-apikeychart sources:kaito-project/llm-gateway-auth/chart/llm-gateway-apikey/values.yaml
Add an entry to the parent chart's Chart.yaml:
# charts/kaito/Chart.yaml
dependencies:
- name: productionstack
version: 0.1.0
repository: "file://../productionstack" # or an OCI / HTTP repo URL
condition: productionstack.enabledThen forward overrides in the parent's values.yaml:
# charts/kaito/values.yaml
productionstack:
enabled: true
body-based-routing:
enabled: true
keda-kaito-scaler:
enabled: true
llm-gateway-apikey:
enabled: trueRun helm dependency update charts/kaito to vendor this chart into
the parent's charts/ directory.
- Drop the component's chart directory under
charts/(e.g.charts/llm-gateway-auth/). - Append a new entry to
dependencies:inChart.yamlwithcondition: <name>.enabled. - Add a
<name>.enabledtoggle (and any documented overrides) tovalues.yaml. - Mention it in the table above and in
templates/NOTES.txt.