Skip to content

Commit 40593eb

Browse files
sozercansurajssd
andauthored
feat: add direct vLLM provider support (#265)
Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
1 parent c5a4422 commit 40593eb

86 files changed

Lines changed: 11500 additions & 182 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ jobs:
121121
strategy:
122122
fail-fast: false
123123
matrix:
124-
provider: [dynamo, kaito, kuberay, llmd]
124+
provider: [dynamo, kaito, kuberay, llmd, vllm]
125125

126126
steps:
127127
- name: Checkout repository

Makefile

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ providers-test: verify-versions
197197
cd providers/kaito && go test ./...
198198
cd providers/kuberay && go test ./...
199199
cd providers/llmd && go test ./...
200+
cd providers/vllm && go test ./...
200201
@echo "✅ Provider tests completed"
201202

202203
# Generate deploy manifests for controller and dashboard
@@ -279,6 +280,7 @@ cleanup-gateway:
279280
GAIE_VERSION_RE := $(subst .,\.,$(GAIE_VERSION))
280281
DYNAMO_VERSION_RE := $(subst .,\.,$(DYNAMO_VERSION))
281282
KAITO_VERSION_RE := $(subst .,\.,$(KAITO_VERSION))
283+
VLLM_VERSION_RE := $(subst .,\.,$(VLLM_VERSION))
282284

283285
verify-versions:
284286
@# 1. controller/go.mod must pin GAIE_VERSION
@@ -296,7 +298,10 @@ verify-versions:
296298
@# 5. providers/kaito/config.go install Command --version arg must match KAITO_VERSION
297299
@grep -qE -- '--version $(KAITO_VERSION_RE) ' providers/kaito/config.go || \
298300
{ echo "❌ providers/kaito/config.go install Command --version != $(KAITO_VERSION) (from versions.env)"; exit 1; }
299-
@# 6. generated TS must be in sync with versions.env.
301+
@# 6. providers/vllm/transformer.go fallback literal must match VLLM_VERSION
302+
@grep -qE '^var VLLMVersion = "$(VLLM_VERSION_RE)"$$' providers/vllm/transformer.go || \
303+
{ echo "❌ providers/vllm/transformer.go VLLMVersion fallback != $(VLLM_VERSION) (from versions.env)"; exit 1; }
304+
@# 7. generated TS must be in sync with versions.env.
300305
@# Generate to a temp file and diff against the working-tree copy so
301306
@# that synced uncommitted edits pass (the local-dev case) while
302307
@# stale committed files still fail (the CI case — CI's working
@@ -314,7 +319,9 @@ verify-versions:
314319
echo "❌ shared/types/versions.generated.ts is stale — run 'cd shared && bun run generate-versions' and commit the result"; \
315320
exit 1; \
316321
}
317-
@echo "✅ versions in sync (GAIE_VERSION=$(GAIE_VERSION), DYNAMO_VERSION=$(DYNAMO_VERSION), KAITO_VERSION=$(KAITO_VERSION))"
322+
@# Print the versions straight from versions.env so this summary stays in
323+
@# sync automatically as keys are added (no hardcoded list to maintain).
324+
@printf '✅ versions in sync (%s)\n' "$$(awk -F= '/^[A-Z][A-Z0-9_]*=/ { printf "%s%s=%s", sep, $$1, $$2; sep=", " }' versions.env)"
318325

319326
# Test the verify-versions guard itself by deliberately breaking each
320327
# input it inspects and asserting the target exits non-zero.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ AI Runway gives you a web UI and a unified Kubernetes CRD (`ModelDeployment`) to
3131
| [**KubeRay**](https://github.com/ray-project/kuberay) | Ray-based distributed inference | [kuberay.yaml](providers/kuberay/deploy/kuberay.yaml) |
3232
| [**KAITO**](https://github.com/kaito-project/kaito) | vLLM (GPU) and llama.cpp (CPU/GPU) support | [kaito.yaml](providers/kaito/deploy/kaito.yaml) |
3333
| [**LLM-D**](https://github.com/llm-d/llm-d) | vLLM (GPU) with aggregated or disaggregated serving | [llmd.yaml](providers/llmd/deploy/llmd.yaml) |
34+
| [**Direct vLLM**](docs/providers/vllm.md) | Direct OpenAI-compatible vLLM Deployments for newest model support | [vllm.yaml](providers/vllm/deploy/vllm.yaml) |
3435

3536
## Quick Start
3637

agents.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## WHY: Project Purpose
44

5-
**AI Runway** is a platform for deploying and managing machine learning models on Kubernetes. It provides a unified CRD abstraction (`ModelDeployment`) that works across multiple inference providers (KAITO, Dynamo, KubeRay, llm-d, etc.).
5+
**AI Runway** is a platform for deploying and managing machine learning models on Kubernetes. It provides a unified CRD abstraction (`ModelDeployment`) that works across multiple inference providers (KAITO, Dynamo, KubeRay, llm-d, Direct vLLM, etc.).
66

77
## WHAT: Tech Stack & Structure
88

@@ -18,6 +18,7 @@
1818
- `controller/config/` - Kustomize manifests for CRDs/RBAC
1919
- `frontend/src/` - React components, hooks, pages
2020
- `backend/src/` - Hono app, providers, services
21+
- `providers/` - Standalone provider controllers/shims (`dynamo`, `kaito`, `kuberay`, `llmd`, `vllm`); each renders `ModelDeployment` into its upstream resource. `providers/vllm` is the in-repo Direct vLLM provider (renders native `Deployment`+`Service`, selected via `provider.name: vllm`).
2122
- `shared/types/` - Shared TypeScript definitions
2223
- `plugins/headlamp/` - Headlamp dashboard plugin
2324
- `docs/` - Detailed documentation (read as needed; also the source rendered on the website)
@@ -103,7 +104,9 @@ Unified API for deploying ML models. Key fields:
103104
- `spec.model.id` - HuggingFace model ID or custom identifier
104105
- `spec.model.source` - `huggingface` or `custom`
105106
- `spec.engine.type` - `vllm`, `sglang`, `trtllm`, or `llamacpp` (optional, auto-selected from provider capabilities)
106-
- `spec.provider.name` - Optional explicit provider selection
107+
- `spec.engine.image` - Optional engine-specific container image override (preferred over legacy top-level `spec.image`; used by Direct vLLM/custom images)
108+
- `spec.engine.extraArgs` - Optional list of raw engine flags appended verbatim
109+
- `spec.provider.name` - Optional explicit provider selection (`kaito`, `dynamo`, `kuberay`, `llmd`, `vllm`)
107110
- `spec.serving.mode` - `aggregated` (default) or `disaggregated`
108111
- `spec.resources.gpu.count` - GPU count for aggregated mode
109112
- `spec.scaling.prefill/decode` - Component scaling for disaggregated mode

backend/scripts/embed-assets.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ function collectFiles(dir: string, prefix: string = ''): AssetInfo[] {
7777
function generateModule(assets: AssetInfo[]): string {
7878
const lines: string[] = [
7979
'// AUTO-GENERATED FILE - DO NOT EDIT',
80+
'// @ts-nocheck - Bun file imports are validated by the bundler, not tsc',
8081
'// Generated by scripts/embed-assets.ts',
8182
'// Run "bun run embed" to regenerate',
8283
'//',

backend/src/hono-app.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import { Hono } from 'hono';
22
import { cors } from 'hono/cors';
33
import { compress } from 'hono/compress';
4+
import { trimTrailingSlash } from 'hono/trailing-slash';
45
import { HTTPException } from 'hono/http-exception';
56

67
import { authService } from './services/auth';
@@ -30,6 +31,7 @@ import {
3031
aiconfigurator,
3132
costs,
3233
gateway,
34+
vllmRecipes,
3335
} from './routes';
3436

3537
// Load static files at startup
@@ -105,6 +107,11 @@ const app = new Hono<AppEnv>();
105107

106108
// Global middleware
107109
app.use('*', compress());
110+
// Treat a trailing slash as equivalent to no slash: Hono routes strictly, so
111+
// "/api/vllm/recipes/" would otherwise 404 while "/api/vllm/recipes" works.
112+
// This only acts on a would-be 404 GET/HEAD, 301-redirecting to the no-slash
113+
// path, so it never changes the outcome of an already-matched route.
114+
app.use('*', trimTrailingSlash());
108115
app.use(
109116
'*',
110117
cors({
@@ -203,6 +210,7 @@ app.route('/api/aikit', aikit);
203210
app.route('/api/aiconfigurator', aiconfigurator);
204211
app.route('/api/costs', costs);
205212
app.route('/api/gateway', gateway);
213+
app.route('/api/vllm/recipes', vllmRecipes);
206214

207215
// Static file serving middleware - uses Bun.file() for zero-copy serving
208216
app.use('*', async (c, next) => {

backend/src/routes/deployments.test.ts

Lines changed: 132 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,90 @@ describe('Deployment Routes', () => {
264264
const data = await res.json();
265265
expect(data.resources[0].manifest.spec.gateway).toEqual({ enabled: false });
266266
});
267+
268+
test('preserves env in preview manifests', async () => {
269+
restores.push(
270+
mockServiceMethod(configService, 'getDefaultNamespace', async () => 'default'),
271+
);
272+
273+
const env = {
274+
VLLM_USE_V1: '1',
275+
NCCL_DEBUG: 'INFO',
276+
};
277+
278+
const res = await app.request('/api/deployments/preview', {
279+
method: 'POST',
280+
headers: { 'Content-Type': 'application/json' },
281+
body: JSON.stringify({
282+
...validDeploymentBody,
283+
provider: 'vllm',
284+
env,
285+
}),
286+
});
287+
288+
expect(res.status).toBe(200);
289+
290+
const data = await res.json();
291+
expect(data.resources[0].manifest.spec.env).toEqual([
292+
{ name: 'VLLM_USE_V1', value: '1' },
293+
{ name: 'NCCL_DEBUG', value: 'INFO' },
294+
]);
295+
});
296+
297+
test('preserves Direct vLLM recipe provenance as metadata annotations in preview manifests', async () => {
298+
restores.push(
299+
mockServiceMethod(configService, 'getDefaultNamespace', async () => 'default'),
300+
);
301+
302+
const imageRef = 'vllm/vllm-openai@sha256:1111111111111111111111111111111111111111111111111111111111111111';
303+
const recipeFeatures = ['prefixCaching', 'kvCacheDtype'];
304+
305+
const res = await app.request('/api/deployments/preview', {
306+
method: 'POST',
307+
headers: { 'Content-Type': 'application/json' },
308+
body: JSON.stringify({
309+
...validDeploymentBody,
310+
name: 'recipe-vllm',
311+
provider: 'vllm',
312+
imageRef,
313+
engineExtraArgs: ['--enable-auto-tool-choice'],
314+
recipeProvenance: {
315+
source: 'vllm-recipes',
316+
id: 'meta-llama/Llama-3.1-8B-Instruct',
317+
strategy: 'single_node_tp',
318+
hardware: 'h100',
319+
variant: 'default',
320+
precision: 'bf16',
321+
features: recipeFeatures,
322+
revision: '2026-05-04',
323+
},
324+
}),
325+
});
326+
327+
expect(res.status).toBe(200);
328+
329+
const data = await res.json();
330+
const manifest = data.resources[0].manifest;
331+
expect(manifest.metadata.annotations).toEqual({
332+
'airunway.ai/generated-by': 'vllm-recipe-resolver',
333+
'airunway.ai/recipe.source': 'vllm-recipes',
334+
'airunway.ai/recipe.id': 'meta-llama/Llama-3.1-8B-Instruct',
335+
'airunway.ai/recipe.strategy': 'single_node_tp',
336+
'airunway.ai/recipe.hardware': 'h100',
337+
'airunway.ai/recipe.variant': 'default',
338+
'airunway.ai/recipe.precision': 'bf16',
339+
'airunway.ai/recipe.revision': '2026-05-04',
340+
'airunway.ai/recipe.features': JSON.stringify(recipeFeatures),
341+
});
342+
expect(manifest.spec.provider.name).toBe('vllm');
343+
expect(manifest.spec.engine.type).toBe('vllm');
344+
expect(manifest.spec.engine.image).toBe(imageRef);
345+
expect(manifest.spec.engine.extraArgs).toEqual(['--enable-auto-tool-choice']);
346+
expect(manifest.spec.image).toBeUndefined();
347+
expect(manifest.spec.recipe).toBeUndefined();
348+
expect(manifest.spec.recipes).toBeUndefined();
349+
expect(manifest.status?.recipe).toBeUndefined();
350+
});
267351
});
268352

269353
describe('POST /api/deployments - storage validation', () => {
@@ -933,8 +1017,9 @@ describe('Deployment Routes', () => {
9331017
});
9341018

9351019
expect(res.status).toBe(201);
936-
expect(capturedConfig.imageRef).toBe('ghcr.io/kaito-project/aikit/runners/llama-cpp-cuda:latest');
937-
expect(capturedConfig.engineArgs?.ggufUrl).toBe(
1020+
expect(capturedConfig).toBeDefined();
1021+
expect(capturedConfig!.imageRef).toBe('ghcr.io/kaito-project/aikit/runners/llama-cpp-cuda:latest');
1022+
expect(capturedConfig!.engineArgs?.ggufUrl).toBe(
9381023
'https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Nano-4B-GGUF/resolve/main/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf'
9391024
);
9401025
});
@@ -976,7 +1061,51 @@ describe('Deployment Routes', () => {
9761061
});
9771062

9781063
expect(res.status).toBe(201);
979-
expect(capturedConfig.imageRef).toBe('ghcr.io/kaito-project/aikit/llama3.2:3b');
1064+
expect(capturedConfig).toBeDefined();
1065+
expect(capturedConfig!.imageRef).toBe('ghcr.io/kaito-project/aikit/llama3.2:3b');
1066+
});
1067+
1068+
test('passes env through create schema to Kubernetes service', async () => {
1069+
let capturedConfig: DeploymentConfig | undefined;
1070+
1071+
restores.push(
1072+
mockServiceMethod(kubernetesService, 'createDeployment', async (config) => {
1073+
capturedConfig = config;
1074+
return undefined;
1075+
}),
1076+
);
1077+
restores.push(
1078+
mockServiceMethod(kubernetesService, 'getClusterGpuCapacity', async () => ({
1079+
totalGpus: 16,
1080+
allocatedGpus: 0,
1081+
availableGpus: 16,
1082+
maxContiguousAvailable: 8,
1083+
nodes: [],
1084+
})),
1085+
);
1086+
restores.push(
1087+
mockServiceMethod(configService, 'getDefaultNamespace', async () => 'default'),
1088+
);
1089+
1090+
const env = {
1091+
VLLM_USE_V1: '1',
1092+
NCCL_DEBUG: 'INFO',
1093+
};
1094+
1095+
const res = await app.request('/api/deployments', {
1096+
method: 'POST',
1097+
headers: { 'Content-Type': 'application/json' },
1098+
body: JSON.stringify({
1099+
...validDeploymentBody,
1100+
name: 'env-test',
1101+
provider: 'vllm',
1102+
env,
1103+
}),
1104+
});
1105+
1106+
expect(res.status).toBe(201);
1107+
expect(capturedConfig).toBeDefined();
1108+
expect(capturedConfig!.env).toEqual(env);
9801109
});
9811110

9821111
test('accepts deployment with providerOverrides', async () => {

backend/src/routes/deployments.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,17 @@ const storageSchema = z.object({
132132
volumes: z.array(storageVolumeSchema).max(8, 'Maximum 8 storage volumes allowed').optional(),
133133
}).optional();
134134

135+
const recipeProvenanceSchema = z.object({
136+
source: z.string().optional(),
137+
id: z.string().optional(),
138+
strategy: z.string().optional(),
139+
hardware: z.string().optional(),
140+
variant: z.string().optional(),
141+
precision: z.string().optional(),
142+
features: z.array(z.string()).optional(),
143+
revision: z.string().optional(),
144+
}).optional();
145+
135146
const createDeploymentSchema = z.object({
136147
name: resourceNameSchema,
137148
modelId: z.string().min(1, 'Model ID is required'),
@@ -152,6 +163,8 @@ const createDeploymentSchema = z.object({
152163
memory: z.string().optional(),
153164
}).optional(),
154165
engineArgs: z.record(z.string(), z.unknown()).optional(),
166+
engineExtraArgs: z.array(z.string()).optional(),
167+
env: z.record(z.string(), z.string()).optional(),
155168
providerOverrides: z.record(z.string(), z.unknown()).optional(),
156169
prefillReplicas: z.number().int().min(0).optional(),
157170
decodeReplicas: z.number().int().min(0).optional(),
@@ -165,6 +178,7 @@ const createDeploymentSchema = z.object({
165178
computeType: z.enum(['cpu', 'gpu']).optional(),
166179
maxModelLen: z.number().int().positive().optional(),
167180
gatewayEnabled: z.boolean().optional(),
181+
recipeProvenance: recipeProvenanceSchema,
168182
storage: storageSchema,
169183
}).superRefine((data, ctx) => {
170184
const volumes = data.storage?.volumes;

backend/src/routes/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ export { default as aikit } from './aikit';
1111
export { default as aiconfigurator } from './aiconfigurator';
1212
export { costsRoutes as costs } from './costs';
1313
export { default as gateway } from './gateway';
14+
export { default as vllmRecipes } from './vllmRecipes';

0 commit comments

Comments
 (0)