Per-namespace shared resources for production-stack workloads. One modelharness release per workload namespace provisions everything every model deployment in that namespace shares:
- the Istio
Gatewaythat fronts the namespace, - the catch-all
EnvoyFilter(model-not-found-direct) that returns an OpenAI-compatible404 model_not_founddirectly from Envoy for any path not matched by a deployment-specificHTTPRoute, - the per-namespace
EnvoyFilterthat injects BBR's ext_proc into the Gateway HCM, - the unified-error
local_replyfilter that maps fail-closed BBR / ext_authz outages and any other>= 500reply onto a consistent OpenAI-compatible error envelope, - (when
auth.enabled) theAuthorizationPolicyandAPIKeyCR that wire the Gateway into the cluster-wideapikey-ext-authzCUSTOM provider, - (when
networkPolicy.enabled) aCiliumNetworkPolicythat locks down East-West ingress to inference workloads in the namespace.
A namespace may host one or more modeldeployment releases, all of which share its Gateway.
| Resource (Kind) | Group / Version | Toggle | Purpose |
|---|---|---|---|
Gateway |
gateway.networking.k8s.io/v1 |
always | Public entry point for the namespace; gatewayClassName: istio, HTTP/80 by default. |
EnvoyFilter bbr-ext-proc |
networking.istio.io/v1alpha3 |
always | Injects the cluster-wide BBR ext_proc into the Gateway HCM, scoped via workloadSelector. BBR itself ships in productionstack / body-based-routing. |
EnvoyFilter model-not-found-direct |
networking.istio.io/v1alpha3 |
always | Patches a direct_response onto the Gateway HCM; returns an OpenAI 404 for any unknown-model path. Required to keep API-key ext_authz running on unknown-model requests. |
EnvoyFilter gateway-filter-outage-local-reply |
networking.istio.io/v1alpha3 |
always | Single namespace-scoped local_reply that maps fail-closed BBR / ext_authz outages, gateway data-plane health failures, and any remaining 5xx onto the unified error envelope. |
EnvoyFilter apikey-ext-authz |
networking.istio.io/v1alpha3 |
auth.enabled |
Splices envoy.filters.http.ext_authz into the Gateway pod and points it at the cluster-wide apikey-authz gRPC Service installed by productionstack/llm-gateway-apikey. |
AuthorizationPolicy apikey-gateway-ext-authz |
security.istio.io/v1 |
auth.enabled |
Per-namespace CUSTOM policy that gates traffic through the cluster-wide apikey-ext-authz extension provider on the Gateway pod. |
APIKey default |
apikeys.kaito.sh/v1alpha1 |
auth.enabled |
Triggers the apikey-operator to reconcile a Secret (llm-api-key) holding the bearer token clients send. |
CiliumNetworkPolicy inference-pods-ingress |
cilium.io/v2 |
networkPolicy.enabled |
Selects pods labelled kaito.sh/owned-by: modeldeployment and admits ingress from the same namespace plus networkPolicy.allowedIngressNamespaces; all other ingress is dropped. |
Namespace |
v1 |
manageNamespace (default true) |
Stamped with productionstack.kaito.sh/managed-by: modelharness so the status reporter can discover it. Skipped when the release namespace is the workload namespace. |
The chart renders cilium.io/v2 CiliumNetworkPolicy (CNP) rather than networking.k8s.io/v1 NetworkPolicy, because on the AKS Azure-CNI overlay dataplane a vanilla K8s NetworkPolicy that selects a pod created before the policy is installed is intermittently never compiled into the per-endpoint BPF map (the CiliumEndpoint reports status.policy.ingress.enforcing=false and traffic flows unrestricted). Native CNPs bypass that translation step and reconcile through Cilium's own path, becoming enforcing within a single tick.
Production deployments should therefore run on a cluster with the Cilium dataplane enabled — for AKS this means --network-dataplane cilium --network-policy cilium (see hack/e2e/scripts/setup-cluster.sh). If you cannot run Cilium, set networkPolicy.enabled=false and apply equivalent ingress isolation out-of-band.
OCI (recommended):
helm install modelharness oci://ghcr.io/kaito-project/modelharness \
--version <X.Y.Z> \
--namespace my-models \
--create-namespaceIn-tree:
helm install modelharness ./charts/modelharness \
--namespace my-models \
--create-namespaceEnable API-key auth and customize the allowed ingress namespaces:
helm install modelharness oci://ghcr.io/kaito-project/modelharness \
--version <X.Y.Z> \
--namespace my-models \
--create-namespace \
--set auth.enabled=true \
--set 'networkPolicy.allowedIngressNamespaces={keda,kaito-system,kube-system,monitoring}'Top-level values (see values.yaml for the full schema, defaults, and inline documentation; cross-field constraints that JSON Schema cannot express are enforced by values.schema.json at install time).
| Key | Default | Description |
|---|---|---|
namespace |
.Release.Namespace |
Workload namespace that owns the rendered resources. |
manageNamespace |
true |
Render the workload Namespace object (stamped with the discovery label). Skipped when the release namespace is the workload namespace. |
gatewayName |
<namespace>-gw |
Per-namespace Istio Gateway name. The Service follows Istio's <gatewayName>-istio convention. |
gatewayClassName |
istio |
GatewayClass the Gateway binds to (AKS Istio add-on with App Routing: approuting-istio). |
gatewayPort |
80 |
HTTP listener port on the Gateway. |
bbr.name / bbr.namespace / bbr.port |
body-based-router / kaito-system / 9004 |
Cluster FQDN coordinates of the BBR Service installed by productionstack/body-based-routing. Must match wherever BBR was installed. |
bbr.envoyFilter.operation / anchorSubFilter |
INSERT_BEFORE / envoy.filters.http.ext_proc |
Filter-chain placement so BBR injects X-Gateway-Model-Name before the InferencePool / EPP ext_proc and the HTTPRoute match run. |
bbr.outlierDetection.* |
see values.yaml |
Passive outlier detection on the BBR ext_proc upstream cluster so an erroring replica is ejected before tripping the fail-closed 502 bbr_unavailable path. |
auth.enabled |
false |
Render the per-namespace API-key auth artifacts (AuthorizationPolicy, APIKey, apikey-ext-authz EnvoyFilter). |
auth.apiKeyName |
default |
Name of the rendered APIKey CR. |
auth.extAuthz.* |
apikey-authz / llm-gateway-auth / 9001 / 5s |
Cluster-wide apikey-authz gRPC Service coordinates the per-namespace EnvoyFilter targets. Defaults match the llm-gateway-apikey subchart. |
networkPolicy.enabled |
true |
Render the per-namespace CiliumNetworkPolicy. Requires the Cilium dataplane (see above). Disable when running on a non-Cilium cluster. |
networkPolicy.allowedIngressNamespaces |
[keda, kaito-system, kube-system, monitoring] |
Cross-namespace ingress allowlist for inference pods. Each entry renders a fromEndpoints clause keyed off k8s:io.kubernetes.pod.namespace. Empty = strict per-namespace isolation. |
charts/productionstack— cluster-level components (BBR data plane, KEDA Kaito scaler, llm-gateway-apikey control plane) thatmodelharnessreferences.charts/modeldeployment— installed per model into a namespace whose Gateway / harness this chart provisioned.