feat: add direct vLLM provider support#265
Open
sozercan wants to merge 14 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds end-to-end “Direct vLLM” support across the Airunway stack (CRD/schema + controller/provider + backend recipe resolution APIs + frontend deploy UX), including image provenance / resolution status and recipe-derived deployment settings.
Changes:
- Introduces vLLM Recipes APIs (shared types + backend routes/resolver + frontend client/mocks/tests).
- Extends the ModelDeployment API surface for Direct vLLM:
spec.engine.image,spec.engine.extraArgs, recipe provenance annotations, andstatus.image. - Adds a new
providers/vllmprovider controller + deployment manifests, and updates llm-d compatibility to preferspec.engine.image+ supportextraArgs.
Show a summary per file
| File | Description |
|---|---|
| shared/types/vllmRecipes.ts | Adds shared TS contract for listing/fetching/resolving vLLM recipes. |
| shared/types/index.ts | Re-exports new vLLM recipe types. |
| shared/types/deployment.ts | Adds recipe provenance + engine image/extraArgs + image status; updates manifest/spec conversion. |
| shared/api/vllmRecipes.ts | Adds shared API client wrapper for vLLM recipes endpoints. |
| shared/api/index.ts | Wires vLLM recipes API into the shared API client. |
| providers/vllm/transformer.go | Implements Direct vLLM resource generation (Deployments/Services) from ModelDeployment. |
| providers/vllm/status.go | Translates upstream Deployment status into ModelDeployment provider status. |
| providers/vllm/status_test.go | Unit tests for Deployment→phase/status translation. |
| providers/vllm/Makefile | Adds build/deploy helpers for the vLLM provider. |
| providers/vllm/image_status_test.go | Tests for image resolution/provenance status behavior. |
| providers/vllm/image_resolver.go | Implements remote digest resolution + best-effort OCI provenance extraction. |
| providers/vllm/go.mod | New Go module for the vLLM provider. |
| providers/vllm/Dockerfile | Builds and packages the vLLM provider controller image. |
| providers/vllm/deploy/vllm.yaml | Generated deploy manifest for installing the vLLM provider. |
| providers/vllm/controller.go | Core vLLM provider reconciler (SSA apply, image resolution status, finalizer). |
| providers/vllm/controller_test.go | Unit tests for compatibility checks and reconcile early-exit behavior. |
| providers/vllm/config/rbac/service_account.yaml | ServiceAccount for vLLM provider installation. |
| providers/vllm/config/rbac/role.yaml | ClusterRole for vLLM provider controller permissions. |
| providers/vllm/config/rbac/role_binding.yaml | ClusterRoleBinding for vLLM provider controller. |
| providers/vllm/config/rbac/kustomization.yaml | Kustomize RBAC bundle. |
| providers/vllm/config/manager/manager.yaml | Manager Deployment template for provider install. |
| providers/vllm/config/manager/kustomization.yaml | Kustomize image override for provider manager. |
| providers/vllm/config/default/kustomization.yaml | Default kustomize bundle wiring RBAC + manager. |
| providers/vllm/config.go | Self-registration/heartbeat for InferenceProviderConfig + install info. |
| providers/vllm/config_test.go | Tests for provider config spec + installation info. |
| providers/vllm/cmd/main.go | Provider controller main entrypoint (controller-runtime manager). |
| providers/llmd/transformer.go | Adds engine.extraArgs support + prefers ImageOverride() for image selection. |
| providers/llmd/transformer_test.go | Tests engine image precedence + extraArgs ordering. |
| Makefile | Includes vLLM provider in provider test target. |
| frontend/src/test/mocks/handlers.ts | Adds MSW handlers for vLLM recipes endpoints. |
| frontend/src/pages/DeployPage.tsx | Uses shared engine display naming for badges. |
| frontend/src/pages/DeploymentDetailsPage.tsx | Improves provider/engine display labels and naming. |
| frontend/src/lib/deploymentDisplay.ts | Adds provider/engine display name helpers. |
| frontend/src/lib/api.ts | Adds frontend vLLM recipes API wrapper and exports shared recipe types. |
| frontend/src/components/models/ModelCard.tsx | Uses engine display naming helper. |
| frontend/src/components/models/HfModelCard.tsx | Uses engine display naming helper. |
| frontend/src/components/deployments/DeploymentList.tsx | Uses provider/engine display naming helpers. |
| frontend/src/components/deployments/DeploymentForm.test.tsx | Adds Direct vLLM deploy flow coverage (launch image + recipe apply + submission). |
| docs/versioning-upgrades.md | Updates provider compatibility matrix with llm-d / Direct vLLM entries. |
| docs/providers.md | Updates provider selection docs and capability matrix; adds Direct vLLM row. |
| docs/crd-reference.md | Documents spec.engine.image + spec.engine.extraArgs and Direct vLLM usage. |
| docs/architecture.md | Updates architecture narrative for provider/runtime registration. |
| docs/api.md | Updates REST API docs for engine image/extraArgs and Direct vLLM semantics. |
| deploy/controller.yaml | Updates published CRD schema (engine.image/extraArgs, provider name, image status). |
| controller/internal/webhook/v1alpha1/modeldeployment_webhook.go | Adds webhook validation for conflicting image override fields. |
| controller/internal/webhook/v1alpha1/modeldeployment_webhook_test.go | Tests webhook rejection/admission for image override conflicts. |
| controller/internal/controller/modeldeployment_validation_test.go | Adds reconciliation-time validation tests for image override conflicts. |
| controller/internal/controller/modeldeployment_controller.go | Enforces image override conflict validation before selection and during validation. |
| controller/internal/controller/gateway_reconciler.go | Adds shared helper functions for label merging/setting in gateway reconciler. |
| controller/config/crd/bases/airunway.ai_modeldeployments.yaml | CRD base schema updated for new engine/image fields + image status. |
| controller/api/v1alpha1/zz_generated.deepcopy.go | Deepcopy updates for new fields (engine.extraArgs, status.image). |
| controller/api/v1alpha1/modeldeployment_validation.go | Adds ValidateImageFields() and ImageOverride() helpers. |
| controller/api/v1alpha1/modeldeployment_types.go | Adds EngineSpec.image/extraArgs, ImageStatus, and new condition constants. |
| backend/src/shared-deployment.test.ts | Tests shared manifest conversion for vLLM image mapping, env, extraArgs, recipe provenance annotations. |
| backend/src/services/vllmRecipesClient.ts | Fetches recipe index/raw payloads from recipes.vllm.ai (configurable base URL). |
| backend/src/services/vllmRecipeResolver.ts | Resolves recipes into engine args/resources/image/env/annotations + provenance/warnings. |
| backend/src/services/vllmRecipeResolver.test.ts | Unit tests for recipe materialization behavior. |
| backend/src/services/kubernetes.ts | Improves provider display names for runtime status reporting. |
| backend/src/routes/vllmRecipes.ts | Adds /api/vllm/recipes endpoints (list/get/resolve). |
| backend/src/routes/index.ts | Exports vLLM recipes routes. |
| backend/src/routes/deployments.ts | Extends create schema to accept recipe provenance, env, engineExtraArgs. |
| backend/src/routes/deployments.test.ts | Adds preview/create tests for env + Direct vLLM recipe provenance materialization. |
| backend/src/hono-app.ts | Registers vLLM recipes routes. |
| backend/scripts/embed-assets.ts | Adds @ts-nocheck to generated embed module header for Bun file imports. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 67/68 changed files
- Comments generated: 9
- **Recipe client SSRF / path-traversal hardening** (`backend/src/services/vllmRecipesClient.ts`, `backend/src/routes/vllmRecipes.ts`): validate Hugging Face model IDs as exactly `<org>/<model>` and `encodeURIComponent` each segment, restrict the `/:org/:model` route to a single path segment, require `https:` for recipe references, and add a 10s `AbortController` timeout to `fetchJson` - **Make Direct vLLM explicit-only** (`providers/vllm/config.go`): remove the selection rule so the provider is never auto-selected, and migrate capabilities to the per-engine `EngineCapability` shape - **Reject disaggregated serving** (`providers/vllm/controller.go`): `validateCompatibility` now rejects `disaggregated` mode to match the advertised aggregated-only capability - **KubeRay honors `spec.engine.image`** (`providers/kuberay/transformer.go`): use `ImageOverride()` so the engine image field is not silently ignored - **Drop empty recipe-provenance annotations** (`shared/types/deployment.ts`): trim string values and skip empty strings/arrays so blank provenance no longer emits `airunway.ai/recipe.*` annotations or a false `generated-by` marker - Update docs (`docs/providers.md`, `docs/providers/vllm.md`) and the related backend/provider tests to match the above Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`SettingsPage.tsx`**: remove a stray extra `}` after `selectDefaultRuntimeId` that broke the entire frontend build (`TS1128`) - **`DeploymentForm.tsx`**: drop the duplicate `vllm` keys in `RUNTIME_INFO` and `RUNTIME_ENGINES` (`TS1117`); the canonical `Direct vLLM` entries now win instead of the stale `vLLM`/native ones - **`kubernetes.ts`**: delete the local `getProviderDisplayName` redeclaration that shadowed the import from `../lib/providers` (`TS2440`) - **`modeldeployment_validation_test.go`**: update the `validateSpec` call to the current 5-arg signature so the controller test package compiles - **`DeploymentList.tsx`, `DeploymentDetailsPage.tsx`**: remove unused `generateAynaUrl` and `MessageSquare` imports that failed lint under `--max-warnings 0` Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`deployments.test.ts`**: type `capturedConfig` as `DeploymentConfig | undefined` instead of `any` (`@typescript-eslint/no-explicit-any`) - **`vllmRecipeResolver.ts`**: remove the unused `findRecordAtPath` helper, and use `const` for the never-reassigned `result` in `applyExplicitFeatureSelection` (`prefer-const`) - **`DeploymentForm.test.tsx`**: update the vLLM runtime card assertions to expect the current `Direct vLLM` description text instead of the stale `native vLLM provider` copy Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`transformer.go`**: skip a derived flag (`--tensor-parallel-size`, `--model`, `--max-model-len`, etc.) when its key is already set in `spec.engine.args`, so an edited GPU count can no longer emit a conflicting duplicate `--tensor-parallel-size` - **`transformer.go`**: render `--enforce-eager` and `--enable-prefix-caching` (the `spec.engine` toggles were silently dropped) - **`transformer.go`**: mount `spec.model.storage` PVC volumes (`volumes` + `volumeMounts`) alongside `/dev/shm` - **`transformer.go`**: reject reserved `host`/`port` engine args in the `--key` and `--key=value` forms, and guard nil `Decode.GPU`/`Prefill.GPU` in `transformDisaggregated` - **`vllmRecipesClient.ts`**: add typed errors (validation/timeout/upstream), a TTL in-memory cache with stale-on-error fallback, and a 5 MiB response-size bound - **`vllmRecipes.ts`**: map recipe errors to `400`/`504`/`502` instead of a blanket `502` - **`controller.go`**: name the conflicting owner in `resourceConflictError` - Add controller and transformer tests for the above, and document the registry-coupling / nightly-digest behavior in `docs/providers/vllm.md` - Note the new `spec.engine.image`/`extraArgs` fields and `providers/vllm` in `agents.md` - Bump `providers/vllm` dependencies (`go.mod`/`go.sum`) Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`vllmRecipesClient.ts`**: cap the per-model cache with LRU eviction (`MAX_CACHE_ENTRIES`) so the unauthenticated recipe route cannot grow it without bound, and stream-bound the response body via `readBoundedBody` so a chunked / no-`content-length` reply is aborted at the 5 MiB cap instead of being fully buffered first - **`vllmRecipeResolver.ts`**: compute GPUs-per-pod as `tensor-parallel × pipeline-parallel` only (data-parallel/decode-context scale replicas, not GPUs), and stop `stripVllmServePrefix` from dropping a leading `--model` flag as if it were the positional model id - **`providers/vllm/controller.go`**: enforce the finalizer timeout even when the owned Deployment is stuck Terminating (Delete returns nil), and skip the `Deploying` phase downgrade when `syncStatus` failed so a transient API error cannot flip a `Running` deployment - **`providers/vllm/image_resolver.go`**: bound the registry resolve with a `context.WithTimeout` so a hung registry cannot stall the reconcile worker - **`providers/vllm/transformer.go`**: extend derived-flag dedup to `spec.engine.extraArgs` (not just `engine.args`), and drop a user-supplied `HF_TOKEN` from `spec.env` when the token secret is injected to avoid a duplicate env entry - **`website/docusaurus.config.js`**: exclude `docs/plans/**` from the published site so internal planning docs are not rendered as public pages - **`docs/providers/vllm.md`**: document the `status.image.source` classification and the `spec.provider.overrides` trust boundary - Add tests covering cache eviction, the streaming size bound, extraArgs dedup, HF_TOKEN dedup, the finalizer-timeout path, GPU-per-pod derivation, the `--model` guard, and disaggregated-mode detection Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`versions.env`** / **`shared/types/versions.generated.ts`**: add `VLLM_VERSION` (`cu130-nightly`) as the single source of truth for the Direct vLLM default image tag, and regenerate the TS export - **`providers/vllm/transformer.go`**: make `VLLMVersion` an ldflags-injectable `var` and compute `DefaultVLLMImage` from `officialVLLMImageRepository` + `VLLMVersion` so the default tracks `versions.env` - **`providers/vllm/Makefile`**: include `versions.env`, inject `VLLMVersion` via `-ldflags`, and add the missing `verify-versions`/`vet`/`test` targets to match the dynamo/kaito providers - **`providers/vllm/Dockerfile`**: require a `VLLM_VERSION` build-arg and inject it via `-ldflags` so the in-image default cannot drift from `versions.env` - **`Makefile`**: add a `verify-versions` check asserting the `transformer.go` `VLLMVersion` fallback literal matches `versions.env` - **`docs/providers/vllm.md`**: document `make controller-deploy` + `make -C providers/vllm deploy` as the in-repo install path alongside the published-manifest `kubectl apply` path Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- **`Makefile`**: replace the hardcoded `versions in sync` echo with an `awk` line that lists every `KEY=VALUE` from `versions.env`, so the summary stays current automatically as keys are added (and now prints all versions, not just three) - **`hack/test-verify-versions.sh`**: add a `providers/vllm/transformer.go` mutation case so the guard self-test also exercises the `VLLMVersion` check Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a first-class Direct vLLM inference provider to AI Runway. Direct vLLM renders a
ModelDeploymentstraight into native KubernetesDeployment+Serviceobjects (no upstream operator/CRD required), pins the container image to an immutable registry digest, and integrates with a new vLLM "recipe" system that imports tuned launch arguments fromrecipes.vllm.ai. It also introduces the shared CRD plumbing (spec.engine.image,spec.engine.extraArgs,status.image) that this and other providers consume.AI Prompt (Optional)
🤖 AI Prompt Used
AI Tool: Claude
Type of Change
Related Issues
Relates to the Direct vLLM provider work (PR #265).
Changes Made
providers/vllmcontroller — a standalone Kubebuilder controller (controller.go,transformer.go,config.go,status.go,image_resolver.go,cmd/main.go) that transforms aModelDeploymentinto anapps/v1Deployment+Service. Supports tensor-parallel sizing fromresources.gpu.count, a memory-backed/dev/shmvolume for multi-GPU,spec.model.storagePVC mounts, HF-token secret injection, and reserved host/port arg protection. ShipsDockerfile,Makefile, RBAC, anddeploy/vllm.yamlshim manifests.RemoteImageResolver(viago-containerregistry) pins tag-based images to digests and recordsstatus.image(Requested/Resolved/Digest/Source/InNightly/Verified/…). Resolution is reused once a digest is cached. The default image is pinned for reproducibility.controller/api/v1alpha1) — newspec.engine.image(preferred) andspec.engine.extraArgsfields, astatus.image(ImageStatus) block, and anImageResolvedcondition.ValidateImageFields()rejects conflictingspec.imagevsspec.engine.image, andImageOverride()centralizes precedence (engine image wins over the legacy top-levelimage). Wired into the core reconciler and validating webhook; CRD/deepcopy regenerated.spec.engine.imageadoption —dynamo,kaito,kuberay, andllmdtransformers now readImageOverride()so the new engine-image field is honored consistently (previously onlyspec.imagewas read).backend/src/services/vllmRecipesClient.tsandvllmRecipeResolver.tsplus thebackend/src/routes/vllmRecipes.tsroutes (GET /vllm/recipes,GET /:org/:model,POST /resolve). Includes strict HF model-ID validation (rejects path traversal), HTTPS-only + origin/path-prefix pinning for recipe references, anAbortControllerfetch timeout, a TTL in-memory cache (stale-on-error), a response-size bound, and typed errors mapped to400/502/504.DeployPage.tsx/DeploymentForm.tsxadd the Direct vLLM deployment method (nightly/stable/custom launch images, recipe apply flow, FP8 precision controls),deploymentDisplay.tscentralizes engine/provider display names, andModelCard/HfModelCard/DeploymentList/DeploymentDetailsPagesurface the new provider and engine labels.shared/types/vllmRecipes.ts,shared/api/vllmRecipes.ts, andshared/types/deployment.tsadd recipe types,engine.image/extraArgs,recipeProvenanceannotations, andenv→EnvVar[]conversion.docs/providers/vllm.md, plus updates toREADME.md,docs/api.md,docs/architecture.md,docs/crd-reference.md,docs/providers.md,docs/versioning-upgrades.md, andagents.md.SelectionRules: nil, never auto-selected) and advertises aggregated serving only (validateCompatibilityrejectsdisaggregated).Makefilewiring for the new provider, resolution of stale-rebase build blockers (duplicate keys, stray brace,validateSpecarity), and CI lint/test fixes.Testing
bun run test)Test coverage added across the stack:
providers/vllm/*_test.go(transformer args/dedup, storage mounts, reserved-arg guard, image status, controller happy-path/ownership-conflict/deletion, real-resolver guards), controller webhook/validation tests, backendvllmRecipesClient/vllmRecipeResolver/deployments/shared-deploymenttests, frontendDeploymentFormtests, and updateddynamo/kaito/llmdtransformer tests.Checklist
bun run lintScreenshots
N/A
Additional Notes
main(merge-basec5a4422), spanning the initial feature, several review-feedback rounds, a rebase onto updatedmain, and CI fixups — totaling 79 files (+10,472/−175).docs/plans/vllm-provider-full-plan.mddescribes forward-looking features (cosign verification, image catalog, broader disaggregated support) that are not all implemented in this PR — treat it as a roadmap, not shipped scope.