feat: add direct vLLM provider support by sozercan · Pull Request #265 · kaito-project/airunway

sozercan · 2026-05-04T16:17:01Z

Description

Adds a first-class Direct vLLM inference provider to AI Runway. Direct vLLM renders a ModelDeployment straight into native Kubernetes Deployment + Service objects (no upstream operator/CRD required), pins the container image to an immutable registry digest, and integrates with a new vLLM "recipe" system that imports tuned launch arguments from recipes.vllm.ai. It also introduces the shared CRD plumbing (spec.engine.image, spec.engine.extraArgs, status.image) that this and other providers consume.

AI Prompt (Optional)

🤖 AI Prompt Used

N/A - Manual implementation (with AI-assisted multi-agent code review and rebase/CI fixups)

AI Tool: Claude

Type of Change

✨ New feature (non-breaking change that adds functionality)
📚 Documentation update
🎨 UI/UX improvement
🧪 Test update
🔧 Build/CI configuration
🐛 Bug fix (non-breaking change that fixes an issue)
💥 Breaking change (fix or feature that would cause existing functionality to change)
♻️ Refactoring (no functional changes)

Related Issues

Relates to the Direct vLLM provider work (PR #265).

Changes Made

New providers/vllm controller — a standalone Kubebuilder controller (controller.go, transformer.go, config.go, status.go, image_resolver.go, cmd/main.go) that transforms a ModelDeployment into an apps/v1 Deployment + Service. Supports tensor-parallel sizing from resources.gpu.count, a memory-backed /dev/shm volume for multi-GPU, spec.model.storage PVC mounts, HF-token secret injection, and reserved host/port arg protection. Ships Dockerfile, Makefile, RBAC, and deploy/vllm.yaml shim manifests.
Image digest resolution + provenance — RemoteImageResolver (via go-containerregistry) pins tag-based images to digests and records status.image (Requested/Resolved/Digest/Source/InNightly/Verified/…). Resolution is reused once a digest is cached. The default image is pinned for reproducibility.
CRD API additions (controller/api/v1alpha1) — new spec.engine.image (preferred) and spec.engine.extraArgs fields, a status.image (ImageStatus) block, and an ImageResolved condition. ValidateImageFields() rejects conflicting spec.image vs spec.engine.image, and ImageOverride() centralizes precedence (engine image wins over the legacy top-level image). Wired into the core reconciler and validating webhook; CRD/deepcopy regenerated.
Cross-provider spec.engine.image adoption — dynamo, kaito, kuberay, and llmd transformers now read ImageOverride() so the new engine-image field is honored consistently (previously only spec.image was read).
vLLM recipe backend — new backend/src/services/vllmRecipesClient.ts and vllmRecipeResolver.ts plus the backend/src/routes/vllmRecipes.ts routes (GET /vllm/recipes, GET /:org/:model, POST /resolve). Includes strict HF model-ID validation (rejects path traversal), HTTPS-only + origin/path-prefix pinning for recipe references, an AbortController fetch timeout, a TTL in-memory cache (stale-on-error), a response-size bound, and typed errors mapped to 400/502/504.
Frontend integration — DeployPage.tsx/DeploymentForm.tsx add the Direct vLLM deployment method (nightly/stable/custom launch images, recipe apply flow, FP8 precision controls), deploymentDisplay.ts centralizes engine/provider display names, and ModelCard/HfModelCard/DeploymentList/DeploymentDetailsPage surface the new provider and engine labels.
Shared types/API — shared/types/vllmRecipes.ts, shared/api/vllmRecipes.ts, and shared/types/deployment.ts add recipe types, engine.image/extraArgs, recipeProvenance annotations, and env→EnvVar[] conversion.
Docs — new docs/providers/vllm.md, plus updates to README.md, docs/api.md, docs/architecture.md, docs/crd-reference.md, docs/providers.md, docs/versioning-upgrades.md, and agents.md.
Provider behavior decisions — Direct vLLM is explicit-only (SelectionRules: nil, never auto-selected) and advertises aggregated serving only (validateCompatibility rejects disaggregated).
Build/CI & rebase fixups — Makefile wiring for the new provider, resolution of stale-rebase build blockers (duplicate keys, stray brace, validateSpec arity), and CI lint/test fixes.

Testing

Unit tests pass (bun run test)
Manual testing performed
Tested with a Kubernetes cluster

Test coverage added across the stack: providers/vllm/*_test.go (transformer args/dedup, storage mounts, reserved-arg guard, image status, controller happy-path/ownership-conflict/deletion, real-resolver guards), controller webhook/validation tests, backend vllmRecipesClient/vllmRecipeResolver/deployments/shared-deployment tests, frontend DeploymentForm tests, and updated dynamo/kaito/llmd transformer tests.

cd controller && go test ./...
cd providers/vllm && go test ./...
cd providers/llmd && go test ./...
bun run test

Checklist

My code follows the project's style guidelines
I have run bun run lint
I have added tests that prove my fix/feature works
New and existing unit tests pass locally
I have updated documentation if needed
My changes generate no new warnings

Screenshots

N/A

Additional Notes

Scope of base: this branch is 11 commits ahead of main (merge-base c5a4422), spanning the initial feature, several review-feedback rounds, a rebase onto updated main, and CI fixups — totaling 79 files (+10,472/−175).
docs/plans/vllm-provider-full-plan.md describes forward-looking features (cosign verification, image catalog, broader disaggregated support) that are not all implemented in this PR — treat it as a roadmap, not shipped scope.
All previously-raised review threads have been addressed and resolved.

Copilot

Pull request overview

Adds end-to-end “Direct vLLM” support across the Airunway stack (CRD/schema + controller/provider + backend recipe resolution APIs + frontend deploy UX), including image provenance / resolution status and recipe-derived deployment settings.

Changes:

Introduces vLLM Recipes APIs (shared types + backend routes/resolver + frontend client/mocks/tests).
Extends the ModelDeployment API surface for Direct vLLM: spec.engine.image, spec.engine.extraArgs, recipe provenance annotations, and status.image.
Adds a new providers/vllm provider controller + deployment manifests, and updates llm-d compatibility to prefer spec.engine.image + support extraArgs.

Show a summary per file

File	Description
shared/types/vllmRecipes.ts	Adds shared TS contract for listing/fetching/resolving vLLM recipes.
shared/types/index.ts	Re-exports new vLLM recipe types.
shared/types/deployment.ts	Adds recipe provenance + engine image/extraArgs + image status; updates manifest/spec conversion.
shared/api/vllmRecipes.ts	Adds shared API client wrapper for vLLM recipes endpoints.
shared/api/index.ts	Wires vLLM recipes API into the shared API client.
providers/vllm/transformer.go	Implements Direct vLLM resource generation (Deployments/Services) from ModelDeployment.
providers/vllm/status.go	Translates upstream Deployment status into ModelDeployment provider status.
providers/vllm/status_test.go	Unit tests for Deployment→phase/status translation.
providers/vllm/Makefile	Adds build/deploy helpers for the vLLM provider.
providers/vllm/image_status_test.go	Tests for image resolution/provenance status behavior.
providers/vllm/image_resolver.go	Implements remote digest resolution + best-effort OCI provenance extraction.
providers/vllm/go.mod	New Go module for the vLLM provider.
providers/vllm/Dockerfile	Builds and packages the vLLM provider controller image.
providers/vllm/deploy/vllm.yaml	Generated deploy manifest for installing the vLLM provider.
providers/vllm/controller.go	Core vLLM provider reconciler (SSA apply, image resolution status, finalizer).
providers/vllm/controller_test.go	Unit tests for compatibility checks and reconcile early-exit behavior.
providers/vllm/config/rbac/service_account.yaml	ServiceAccount for vLLM provider installation.
providers/vllm/config/rbac/role.yaml	ClusterRole for vLLM provider controller permissions.
providers/vllm/config/rbac/role_binding.yaml	ClusterRoleBinding for vLLM provider controller.
providers/vllm/config/rbac/kustomization.yaml	Kustomize RBAC bundle.
providers/vllm/config/manager/manager.yaml	Manager Deployment template for provider install.
providers/vllm/config/manager/kustomization.yaml	Kustomize image override for provider manager.
providers/vllm/config/default/kustomization.yaml	Default kustomize bundle wiring RBAC + manager.
providers/vllm/config.go	Self-registration/heartbeat for InferenceProviderConfig + install info.
providers/vllm/config_test.go	Tests for provider config spec + installation info.
providers/vllm/cmd/main.go	Provider controller main entrypoint (controller-runtime manager).
providers/llmd/transformer.go	Adds `engine.extraArgs` support + prefers `ImageOverride()` for image selection.
providers/llmd/transformer_test.go	Tests engine image precedence + extraArgs ordering.
Makefile	Includes vLLM provider in provider test target.
frontend/src/test/mocks/handlers.ts	Adds MSW handlers for vLLM recipes endpoints.
frontend/src/pages/DeployPage.tsx	Uses shared engine display naming for badges.
frontend/src/pages/DeploymentDetailsPage.tsx	Improves provider/engine display labels and naming.
frontend/src/lib/deploymentDisplay.ts	Adds provider/engine display name helpers.
frontend/src/lib/api.ts	Adds frontend vLLM recipes API wrapper and exports shared recipe types.
frontend/src/components/models/ModelCard.tsx	Uses engine display naming helper.
frontend/src/components/models/HfModelCard.tsx	Uses engine display naming helper.
frontend/src/components/deployments/DeploymentList.tsx	Uses provider/engine display naming helpers.
frontend/src/components/deployments/DeploymentForm.test.tsx	Adds Direct vLLM deploy flow coverage (launch image + recipe apply + submission).
docs/versioning-upgrades.md	Updates provider compatibility matrix with llm-d / Direct vLLM entries.
docs/providers.md	Updates provider selection docs and capability matrix; adds Direct vLLM row.
docs/crd-reference.md	Documents `spec.engine.image` + `spec.engine.extraArgs` and Direct vLLM usage.
docs/architecture.md	Updates architecture narrative for provider/runtime registration.
docs/api.md	Updates REST API docs for engine image/extraArgs and Direct vLLM semantics.
deploy/controller.yaml	Updates published CRD schema (engine.image/extraArgs, provider name, image status).
controller/internal/webhook/v1alpha1/modeldeployment_webhook.go	Adds webhook validation for conflicting image override fields.
controller/internal/webhook/v1alpha1/modeldeployment_webhook_test.go	Tests webhook rejection/admission for image override conflicts.
controller/internal/controller/modeldeployment_validation_test.go	Adds reconciliation-time validation tests for image override conflicts.
controller/internal/controller/modeldeployment_controller.go	Enforces image override conflict validation before selection and during validation.
controller/internal/controller/gateway_reconciler.go	Adds shared helper functions for label merging/setting in gateway reconciler.
controller/config/crd/bases/airunway.ai_modeldeployments.yaml	CRD base schema updated for new engine/image fields + image status.
controller/api/v1alpha1/zz_generated.deepcopy.go	Deepcopy updates for new fields (engine.extraArgs, status.image).
controller/api/v1alpha1/modeldeployment_validation.go	Adds `ValidateImageFields()` and `ImageOverride()` helpers.
controller/api/v1alpha1/modeldeployment_types.go	Adds EngineSpec.image/extraArgs, ImageStatus, and new condition constants.
backend/src/shared-deployment.test.ts	Tests shared manifest conversion for vLLM image mapping, env, extraArgs, recipe provenance annotations.
backend/src/services/vllmRecipesClient.ts	Fetches recipe index/raw payloads from recipes.vllm.ai (configurable base URL).
backend/src/services/vllmRecipeResolver.ts	Resolves recipes into engine args/resources/image/env/annotations + provenance/warnings.
backend/src/services/vllmRecipeResolver.test.ts	Unit tests for recipe materialization behavior.
backend/src/services/kubernetes.ts	Improves provider display names for runtime status reporting.
backend/src/routes/vllmRecipes.ts	Adds `/api/vllm/recipes` endpoints (list/get/resolve).
backend/src/routes/index.ts	Exports vLLM recipes routes.
backend/src/routes/deployments.ts	Extends create schema to accept recipe provenance, env, engineExtraArgs.
backend/src/routes/deployments.test.ts	Adds preview/create tests for env + Direct vLLM recipe provenance materialization.
backend/src/hono-app.ts	Registers vLLM recipes routes.
backend/scripts/embed-assets.ts	Adds `@ts-nocheck` to generated embed module header for Bun file imports.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 67/68 changed files
Comments generated: 9

Copilot

Copilot's findings

Files reviewed: 80/81 changed files
Comments generated: 3

Copilot

Copilot's findings

Files reviewed: 80/81 changed files
Comments generated: 6

Copilot

Copilot's findings

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Language not supported

Files reviewed: 76/78 changed files
Comments generated: 2

Copilot

Pull request overview

Copilot reviewed 76 out of 78 changed files in this pull request and generated no new comments.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

- **Recipe client SSRF / path-traversal hardening** (`backend/src/services/vllmRecipesClient.ts`, `backend/src/routes/vllmRecipes.ts`): validate Hugging Face model IDs as exactly `<org>/<model>` and `encodeURIComponent` each segment, restrict the `/:org/:model` route to a single path segment, require `https:` for recipe references, and add a 10s `AbortController` timeout to `fetchJson` - **Make Direct vLLM explicit-only** (`providers/vllm/config.go`): remove the selection rule so the provider is never auto-selected, and migrate capabilities to the per-engine `EngineCapability` shape - **Reject disaggregated serving** (`providers/vllm/controller.go`): `validateCompatibility` now rejects `disaggregated` mode to match the advertised aggregated-only capability - **KubeRay honors `spec.engine.image`** (`providers/kuberay/transformer.go`): use `ImageOverride()` so the engine image field is not silently ignored - **Drop empty recipe-provenance annotations** (`shared/types/deployment.ts`): trim string values and skip empty strings/arrays so blank provenance no longer emits `airunway.ai/recipe.*` annotations or a false `generated-by` marker - Update docs (`docs/providers.md`, `docs/providers/vllm.md`) and the related backend/provider tests to match the above Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

- **`SettingsPage.tsx`**: remove a stray extra `}` after `selectDefaultRuntimeId` that broke the entire frontend build (`TS1128`) - **`DeploymentForm.tsx`**: drop the duplicate `vllm` keys in `RUNTIME_INFO` and `RUNTIME_ENGINES` (`TS1117`); the canonical `Direct vLLM` entries now win instead of the stale `vLLM`/native ones - **`kubernetes.ts`**: delete the local `getProviderDisplayName` redeclaration that shadowed the import from `../lib/providers` (`TS2440`) - **`modeldeployment_validation_test.go`**: update the `validateSpec` call to the current 5-arg signature so the controller test package compiles - **`DeploymentList.tsx`, `DeploymentDetailsPage.tsx`**: remove unused `generateAynaUrl` and `MessageSquare` imports that failed lint under `--max-warnings 0` Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 76 out of 78 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

- **`deployments.test.ts`**: type `capturedConfig` as `DeploymentConfig | undefined` instead of `any` (`@typescript-eslint/no-explicit-any`) - **`vllmRecipeResolver.ts`**: remove the unused `findRecordAtPath` helper, and use `const` for the never-reassigned `result` in `applyExplicitFeatureSelection` (`prefer-const`) - **`DeploymentForm.test.tsx`**: update the vLLM runtime card assertions to expect the current `Direct vLLM` description text instead of the stale `native vLLM provider` copy Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

- **`transformer.go`**: skip a derived flag (`--tensor-parallel-size`, `--model`, `--max-model-len`, etc.) when its key is already set in `spec.engine.args`, so an edited GPU count can no longer emit a conflicting duplicate `--tensor-parallel-size` - **`transformer.go`**: render `--enforce-eager` and `--enable-prefix-caching` (the `spec.engine` toggles were silently dropped) - **`transformer.go`**: mount `spec.model.storage` PVC volumes (`volumes` + `volumeMounts`) alongside `/dev/shm` - **`transformer.go`**: reject reserved `host`/`port` engine args in the `--key` and `--key=value` forms, and guard nil `Decode.GPU`/`Prefill.GPU` in `transformDisaggregated` - **`vllmRecipesClient.ts`**: add typed errors (validation/timeout/upstream), a TTL in-memory cache with stale-on-error fallback, and a 5 MiB response-size bound - **`vllmRecipes.ts`**: map recipe errors to `400`/`504`/`502` instead of a blanket `502` - **`controller.go`**: name the conflicting owner in `resourceConflictError` - Add controller and transformer tests for the above, and document the registry-coupling / nightly-digest behavior in `docs/providers/vllm.md` - Note the new `spec.engine.image`/`extraArgs` fields and `providers/vllm` in `agents.md` - Bump `providers/vllm` dependencies (`go.mod`/`go.sum`) Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 77 out of 79 changed files in this pull request and generated no new comments.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

Copilot

Pull request overview

Copilot reviewed 77 out of 79 changed files in this pull request and generated no new comments.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

Copilot

Pull request overview

Copilot reviewed 77 out of 79 changed files in this pull request and generated no new comments.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

- **`vllmRecipesClient.ts`**: cap the per-model cache with LRU eviction (`MAX_CACHE_ENTRIES`) so the unauthenticated recipe route cannot grow it without bound, and stream-bound the response body via `readBoundedBody` so a chunked / no-`content-length` reply is aborted at the 5 MiB cap instead of being fully buffered first - **`vllmRecipeResolver.ts`**: compute GPUs-per-pod as `tensor-parallel × pipeline-parallel` only (data-parallel/decode-context scale replicas, not GPUs), and stop `stripVllmServePrefix` from dropping a leading `--model` flag as if it were the positional model id - **`providers/vllm/controller.go`**: enforce the finalizer timeout even when the owned Deployment is stuck Terminating (Delete returns nil), and skip the `Deploying` phase downgrade when `syncStatus` failed so a transient API error cannot flip a `Running` deployment - **`providers/vllm/image_resolver.go`**: bound the registry resolve with a `context.WithTimeout` so a hung registry cannot stall the reconcile worker - **`providers/vllm/transformer.go`**: extend derived-flag dedup to `spec.engine.extraArgs` (not just `engine.args`), and drop a user-supplied `HF_TOKEN` from `spec.env` when the token secret is injected to avoid a duplicate env entry - **`website/docusaurus.config.js`**: exclude `docs/plans/**` from the published site so internal planning docs are not rendered as public pages - **`docs/providers/vllm.md`**: document the `status.image.source` classification and the `spec.provider.overrides` trust boundary - Add tests covering cache eviction, the streaming size bound, extraArgs dedup, HF_TOKEN dedup, the finalizer-timeout path, GPU-per-pod derivation, the `--model` guard, and disaggregated-mode detection Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

- **`versions.env`** / **`shared/types/versions.generated.ts`**: add `VLLM_VERSION` (`cu130-nightly`) as the single source of truth for the Direct vLLM default image tag, and regenerate the TS export - **`providers/vllm/transformer.go`**: make `VLLMVersion` an ldflags-injectable `var` and compute `DefaultVLLMImage` from `officialVLLMImageRepository` + `VLLMVersion` so the default tracks `versions.env` - **`providers/vllm/Makefile`**: include `versions.env`, inject `VLLMVersion` via `-ldflags`, and add the missing `verify-versions`/`vet`/`test` targets to match the dynamo/kaito providers - **`providers/vllm/Dockerfile`**: require a `VLLM_VERSION` build-arg and inject it via `-ldflags` so the in-image default cannot drift from `versions.env` - **`Makefile`**: add a `verify-versions` check asserting the `transformer.go` `VLLMVersion` fallback literal matches `versions.env` - **`docs/providers/vllm.md`**: document `make controller-deploy` + `make -C providers/vllm deploy` as the in-repo install path alongside the published-manifest `kubectl apply` path Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

- **`Makefile`**: replace the hardcoded `versions in sync` echo with an `awk` line that lists every `KEY=VALUE` from `versions.env`, so the summary stays current automatically as keys are added (and now prints all versions, not just three) - **`hack/test-verify-versions.sh`**: add a `providers/vllm/transformer.go` mutation case so the guard self-test also exercises the `VLLMVersion` check Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 81 out of 83 changed files in this pull request and generated no new comments.

Files not reviewed (1)

controller/api/v1alpha1/zz_generated.deepcopy.go: Generated file

Copilot AI review requested due to automatic review settings May 4, 2026 16:17

sozercan requested a review from a team as a code owner May 4, 2026 16:17

Copilot started reviewing on behalf of sozercan May 4, 2026 16:17 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

robert-cronin reviewed May 7, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 8, 2026 04:27

Copilot started reviewing on behalf of sozercan May 8, 2026 04:28 View session

sozercan requested a review from robert-cronin May 8, 2026 04:30

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread shared/types/deployment.ts Outdated

Comment thread backend/src/services/vllmRecipesClient.ts Outdated

Comment thread backend/src/services/vllmRecipesClient.ts Outdated

Copilot AI review requested due to automatic review settings May 8, 2026 05:26

Copilot started reviewing on behalf of sozercan May 8, 2026 05:26 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 12, 2026 05:53

Copilot started reviewing on behalf of sozercan May 12, 2026 05:54 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread backend/src/services/vllmRecipesClient.ts Outdated

Comment thread backend/src/routes/vllmRecipes.ts Outdated

sozercan added this to the 0.7.0 milestone May 19, 2026

surajssd self-assigned this Jun 10, 2026

surajssd force-pushed the vllm-nightly branch from 5088c53 to c0c3109 Compare June 11, 2026 00:04

Copilot AI review requested due to automatic review settings June 12, 2026 00:27

surajssd force-pushed the vllm-nightly branch from c0c3109 to 3cca8e8 Compare June 12, 2026 00:27

Copilot started reviewing on behalf of surajssd June 12, 2026 00:28 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

sozercan and others added 8 commits June 15, 2026 12:03

feat: add direct vllm provider

a259c2c

fix: refine direct vllm search and deployment ux

857a41e

fix: address direct vllm review feedback

098a0d9

docs: add direct vllm provider guide

fbe8e68

fix: stop advertising direct vllm disaggregated

a073a31

feat(vllm): add production deployment defaults

46a1103

fix: address vllm review cleanup

33d35b9

surajssd force-pushed the vllm-nightly branch from 3cca8e8 to dbd7deb Compare June 15, 2026 19:08

surajssd requested a review from Copilot June 15, 2026 19:08

Copilot started reviewing on behalf of surajssd June 15, 2026 19:09 View session