WIP: feat: add LoRA adapter support for ModelDeployment CRD by sozercan · Pull Request #84 · kaito-project/airunway

sozercan · 2026-02-23T18:27:53Z

Merge after #73

LoRA Adapter Support

Adds unified LoRA adapter abstraction to ModelDeployment CRD with support across all three providers.

Changes

CRD: spec.adapters[] with name and source (hf:// URI scheme)
InferenceProviderConfig: loraSupport capability for auto-selection filtering
Webhook: validates llamacpp blocking, unique names, hf:// scheme
Controller: provider auto-selection filters by LoRA support, InferenceObjective per adapter for gateway routing
KAITO: maps adapters → inference.adapters on Workspace CRD
KubeRay: injects --enable-lora + --lora-modules into VLLM_ENGINE_ARGS
Dynamo: --enable-lora, LoRA env vars, DynamoModel CRDs, init container for HF download, modelRef for endpoint discovery, updated to 0.9.0 runtime images

Testing

Unit tests for all provider transformers and webhook validation
E2E tested on AKS GPU cluster with Dynamo provider + unsloth/Qwen3-0.6B + lucylq/qwen3_06B_lora_math adapter

Known Issues

Dynamo DynamoModel operator has a race condition: tries to load LoRA before vLLM finishes initializing. Workaround: delete/recreate DynamoModel after worker is ready.
Dynamo hf:// download via DynamoModel is async and may silently fail. file:// local path loading works reliably after init container pre-downloads.

TODO

Fix Dynamo race condition (wait for worker readiness before creating DynamoModel)
Consider direct /v1/loras API call from provider controller instead of DynamoModel CRD for hf:// sources
Test with KubeRay provider
Test with KAITO provider (needs KAITO preset model with LoRA support)

Create docs/gateway.md covering architecture, prerequisites, compatible gateway implementations, setup steps, configuration options (auto-detection, explicit flags, per-deployment overrides), usage examples (curl and Python), and troubleshooting. Update docs/architecture.md with a Gateway API Integration section and link to the new guide. Update README.md with a Gateway API Integration highlight and doc link. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… routing Add support for the Gateway API Inference Extension (inference.networking.k8s.io/v1) to provide a single unified inference gateway endpoint across all providers. When Gateway API CRDs are detected in the cluster, the controller automatically creates InferencePool and HTTPRoute resources for each ModelDeployment. Controller changes: - Add gateway-api and gateway-api-inference-extension Go dependencies - Add GatewaySpec (spec.gateway) and GatewayStatus to ModelDeployment CRD - Implement gateway reconciler for InferencePool and HTTPRoute lifecycle - Add gateway auto-detection with CRD availability caching - Support explicit --gateway-name/--gateway-namespace flags - Add RBAC for inferencepools, httproutes, and gateways - Inject kubeairunway.ai/model-deployment label in all providers (KAITO, Dynamo, KubeRay) Backend/frontend changes: - Add GET /gateway/status and GET /gateway/models API routes - Add gateway status to deployment detail responses - Add GatewayStatus, GatewayInfo, GatewayModelInfo shared types - Add gateway API client methods in frontend Tests and docs: - Add gateway reconciler tests (11 tests) and detection tests (7 tests) - Add docs/gateway.md with architecture, setup, and usage guide - Update docs/architecture.md, crd-reference.md, controller-architecture.md, api.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ndpoint - Fix backend API group from inference.networking.x-k8s.io/v1alpha2 to inference.networking.k8s.io/v1 to match upstream stable API - Add required EndpointPickerRef to InferencePool with configurable --epp-service-name and --epp-service-port controller flags - Resolve gateway endpoint from Gateway.status.addresses instead of constructing invalid DNS name - Add Istio setup notes and EPP configuration docs to gateway.md - Add test for endpoint resolution from Gateway status Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Probe the model server's /v1/models endpoint to resolve the actual served model name when no explicit spec.gateway.modelName or spec.model.servedName is set. This fixes gateway routing for baked-in model images where the served name differs from spec.model.id. Resolution priority: 1. spec.gateway.modelName (explicit override) 2. spec.model.servedName (user-specified) 3. Auto-discovered from /v1/models on running server 4. spec.model.id (fallback) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add tests for resolveModelName priority chain: explicit override, served name, unreachable server fallback, no endpoint fallback - Update gateway.md with model name resolution section documenting the 4-level priority chain including auto-discovery - Fix stale comment in modeldeployment_types.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ady=False - cleanupGatewayResources now sets GatewayReady condition to False so conditions stay consistent when gateway resources are removed - When deployment leaves Running phase (Failed, Terminating, etc.), gateway resources are cleaned up if they previously existed - Add test for phase transition cleanup and condition verification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fail fast at startup if only one of --gateway-name/--gateway-namespace is set, preventing silent fallback to auto-detection - Add 60s TTL for negative CRD detection results so gateway integration self-enables if CRDs are installed after controller startup. Positive results remain cached permanently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Tests the full Gateway API Inference Extension integration: - Installs Gateway API CRDs, Inference Extension CRDs, and Istio - Creates Gateway resource and deploys a CPU model - Verifies InferencePool created with correct selector and EPP ref - Verifies HTTPRoute created with correct backend ref - Verifies model name auto-discovery from /v1/models - Tests actual inference routing through the Istio gateway - Tests gateway disable and resource cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The gateway reconciliation may need an extra reconcile cycle after the deployment transitions to Running phase. Add a 30-attempt retry loop with 5s intervals instead of checking once. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Set model.id in test fixture so fallback model name is non-empty - Replace gateway-routed inference test with direct service test (gateway routing requires EPP which isn't deployed in e2e) - Keep gateway resource verification (InferencePool, HTTPRoute, status, conditions) as the GAIE integration test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The auto-discovery probes /v1/models on the model service, but status.endpoint.port may contain the container port (e.g. 5000) while the service exposes port 80. Look up the actual service port first, falling back to status.endpoint.port if unavailable. This specifically fixes aikit/llamacpp models where KAITO reports container port 5000 but the service maps 80→5000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The controller needs permission to read Services to look up the actual service port for model name auto-discovery. Without this, the probe used the container port (e.g. 5000) instead of the service port (80), causing discovery to fail. Also adds resolveServicePort() which looks up the service's HTTP port, preferring ports named 'http' or on 80/8080. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Install the upstream inferencepool helm chart to deploy the EPP (Endpoint Picker Proxy), then test actual inference routing through the Istio gateway instead of direct service port-forward. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The controller now automatically creates the Endpoint Picker Proxy (EPP) deployment, service, RBAC, and config when gateway integration is enabled. Users no longer need to install the EPP separately. Resources created per ModelDeployment: - ServiceAccount, Role, RoleBinding for EPP RBAC - ConfigMap with default plugins config - Deployment running the upstream EPP image - Service exposing gRPC port 9002 All resources are owned by the ModelDeployment and cleaned up automatically. EPP image is configurable via --epp-image flag. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The controller needs pods get/watch/list and leases create/get/update permissions on its own service account to avoid RBAC escalation errors when creating the EPP Role (Kubernetes prevents granting permissions the creator doesn't hold). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The HTTPRoute may be created in the same reconcile cycle as the verification step runs. Add a retry loop to wait for it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pods created by providers may not have the kubeairunway.ai/model-deployment label. The controller now discovers pods via the model service's selector and patches the label onto them, provider-agnostically. Also adds pod patch RBAC and fixes EPP log label in e2e debug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…writes) The EPP watches these experimental resources even when unused. Without RBAC for them, the cache sync fails and health check returns NOT_SERVING. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The controller needs the same permissions it grants to the EPP Role, otherwise Kubernetes blocks the Role creation as RBAC escalation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The controller deploys the EPP (Deployment + Service + RBAC), but Istio-specific wiring (DestinationRule with h2c upgrade) is BYO. Apply it directly in the e2e test since this is implementation-specific. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Kind doesn't support LoadBalancer, so the Gateway never becomes Programmed. Use networking.istio.io/service-type: NodePort annotation to get a NodePort service that works in Kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Port-forwarding to the gateway pod bypasses ext_proc. Use the NodePort service endpoint instead, accessing the node's internal IP. Also remove exclude-from-external-load-balancers label on Kind node. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

InferencePool targetPorts routes directly to pods, so it needs the container port (e.g. 5000), not the service port (e.g. 80). Look up the service's targetPort to get the actual container port. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The validation fix for extensionManager.backendResources without hooks may only be on main. Try the latest dev build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Traffic routing through the gateway requires either: - Envoy AI Gateway controller (for backendResources support) - Istio with working ext_proc/mTLS (connection_termination in Kind) Neither works in a basic Kind cluster. The e2e tests verify all controller-side logic comprehensively. Traffic routing was validated manually on AKS. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert from Envoy Gateway to Istio. Add cloud-provider-kind to provide LoadBalancer IP assignment in Kind, which should fix the Gateway Programmed=Unknown issue. Also restores the traffic routing test using the Gateway's LoadBalancer IP directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloud-provider-kind provides LoadBalancer IP, Gateway is Programmed, but Istio's ext_proc can't connect to EPP without mTLS. Enable sidecar injection on default namespace so EPP gets Istio proxy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Explicitly tell Istio sidecar to intercept port 9002 for ext_proc gRPC traffic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With enableAutoMtls=false, the gateway proxy should connect to the EPP using plaintext gRPC without mTLS. No sidecar needed on the EPP pod. The ext_proc cluster should use h2c based on the service port name (grpc-ext-proc) and appProtocol. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Per upstream GAIE chart (inferencepool/templates/istio.yaml), Istio needs tls.mode=SIMPLE with insecureSkipVerify=true to connect to the EPP. The previous h2UpgradePolicy approach was wrong. Also adds cloud-provider-kind for LoadBalancer IP in Kind. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When httpRouteRef is set, the controller skips auto-creating the HTTPRoute and uses the user-provided one. This enables custom routing logic like LoRA adapter selection, traffic splitting across model versions, and custom payload processors. The controller still auto-creates InferencePool + EPP regardless. Cleanup also respects httpRouteRef — won't delete user-provided routes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Per Gateway API conventions, readiness shouldn't be a single bool. The GatewayReady condition with reason/message already captures this with proper granularity. Users should check the condition or refer to Gateway API resource status directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

If gateway reconciliation fails with a CRD-not-found error (e.g. CRDs were removed), refresh the detection cache so subsequent reconciles skip gateway integration gracefully. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pin Gateway API Inference Extension CRDs to v1.3.1 instead of latest. Update Go module dependency to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Log a warning when multiple Gateways are labeled with kubeairunway.ai/inference-gateway=true, suggesting gatewayRef for explicit selection. Uses the first labeled one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BBR (Body-Based Router) is a separate deployment needed only for multi-model setups. Updated architecture diagram, added BBR section with helm install instructions pinned to v1.3.1, and clarified that single-model setups don't need BBR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Install the upstream body-based-routing helm chart with Istio provider in the e2e test. Validates the full GAIE stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

For multi-model setups with BBR, each HTTPRoute needs a header match on X-Gateway-Base-Model-Name to route to the correct InferencePool. BBR sets this header from the request body's model field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Define GAIE_VERSION in Makefile (v1.3.1) and DefaultGAIEVersion constant in gateway package. EPP image tag defaults to this version in both cmd/main.go and gateway_reconciler.go. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The header match (X-Gateway-Base-Model-Name) only works when BBR is deployed. Add a fallback PathPrefix / match so single-model setups work without BBR. With BBR, the header match takes priority. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ypes 1. Remove duplicate DeploymentConfig interface (incompatible properties broke TypeScript build — pre-existing issue also on main) 2. Derive gateway model readiness from GatewayReady condition instead of removed status.gateway.ready field 3. Restore shared/types/aikit.ts re-export file and barrel export Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add --enable-lora engine arg when adapters are specified - Add loraEnvVars helper for Dynamo LoRA env vars (DYN_LORA_ENABLED, DYN_SYSTEM_ENABLED, DYN_SYSTEM_PORT, DYN_LORA_PATH) - Inject LoRA env vars into aggregated, prefill, and decode workers - Add reconcileAdapters to create/update DynamoModel CRDs per adapter - Add cleanupOrphanedDynamoModels for adapter lifecycle management - Add DynamoModel cleanup on ModelDeployment deletion - Add RBAC marker for DynamoModel resources - Set LoRASupport: true in provider capabilities Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add LoRAAdapterSpec and AdapterStatus types to ModelDeployment - Add LoRASupport capability to InferenceProviderConfig - Webhook validation: block llamacpp+adapters, unique names, hf:// scheme - Provider auto-selection filters by LoRA support - KAITO: map adapters to inference.adapters on Workspace - KubeRay: inject --enable-lora + --lora-modules into VLLM_ENGINE_ARGS - Dynamo: --enable-lora, LoRA env vars, DynamoModel CRDs, init container for HF adapter download, modelRef for endpoint discovery - Gateway: auto-create InferenceObjective per adapter - Update Dynamo runtime images to 0.9.0 - Add unit tests for all providers and webhook - Add docs/lora-adapters.md user guide - Add sample YAML with chess LoRA adapter

sozercan and others added 30 commits February 13, 2026 14:23

docs: fix gateway overview link to point to repo instead of GEP

a3877f9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: remove status column from gateway implementations table

cb4f9d9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: clarify gateway implementations are BYO

82f3435

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: move Istio note to setup, remove from troubleshooting

eaba4f4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: show gateway.enabled in deploy example

7b4807a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add retry loop for HTTPRoute existence check in e2e

e71b53a

The HTTPRoute may be created in the same reconcile cycle as the verification step runs. Add a retry loop to wait for it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add retry loop for InferencePool existence check in e2e

ffba6bb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add x-k8s.io RBAC to controller SA to avoid escalation

5c53ef0

The controller needs the same permissions it grants to the EPP Role, otherwise Kubernetes blocks the Role creation as RBAC escalation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

debug: add HTTP status code to gateway inference test output

d271089

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sozercan and others added 27 commits February 19, 2026 22:27

test: try Envoy Gateway v0.0.0-latest (dev build) for backendResources

c0b3bef

The validation fix for extensionManager.backendResources without hooks may only be on main. Try the latest dev build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add includeInboundPorts annotation for EPP sidecar

891415b

Explicitly tell Istio sidecar to intercept port 9002 for ext_proc gRPC traffic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: add cross-namespace Gateway setup with ReferenceGrant

6c8b601

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add isNoMatchError test cases

71843a9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: remove port-forwarding mention from gateway overview

b5f693f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chore: pin GAIE to v1.3.1, update Go dependency

d61d0ea

Pin Gateway API Inference Extension CRDs to v1.3.1 instead of latest. Update Go module dependency to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chore: use official EPP image from registry.k8s.io pinned to v1.3.1

ab3cc9e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: use registry.k8s.io for BBR chart

5ba855e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: add version matching note with go.mod link for BBR chart

117a944

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: install BBR in e2e for multi-model readiness

da05ae2

Install the upstream body-based-routing helm chart with Istio provider in the e2e test. Validates the full GAIE stack. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

merge: resolve conflicts with main

685e3b8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sozercan linked an issue Feb 24, 2026 that may be closed by this pull request

support for lora adapters #10

Open

sozercan closed this Apr 28, 2026

sozercan reopened this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: feat: add LoRA adapter support for ModelDeployment CRD#84

WIP: feat: add LoRA adapter support for ModelDeployment CRD#84
sozercan wants to merge 85 commits into
mainfrom
lora

sozercan commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sozercan commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LoRA Adapter Support

Changes

Testing

Known Issues

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sozercan commented Feb 23, 2026 •

edited

Loading