Skip to content

Commit 8160773

Browse files
committed
fix(vllm): address review findings for the Direct vLLM provider
- **`transformer.go`**: skip a derived flag (`--tensor-parallel-size`, `--model`, `--max-model-len`, etc.) when its key is already set in `spec.engine.args`, so an edited GPU count can no longer emit a conflicting duplicate `--tensor-parallel-size` - **`transformer.go`**: render `--enforce-eager` and `--enable-prefix-caching` (the `spec.engine` toggles were silently dropped) - **`transformer.go`**: mount `spec.model.storage` PVC volumes (`volumes` + `volumeMounts`) alongside `/dev/shm` - **`transformer.go`**: reject reserved `host`/`port` engine args in the `--key` and `--key=value` forms, and guard nil `Decode.GPU`/`Prefill.GPU` in `transformDisaggregated` - **`vllmRecipesClient.ts`**: add typed errors (validation/timeout/upstream), a TTL in-memory cache with stale-on-error fallback, and a 5 MiB response-size bound - **`vllmRecipes.ts`**: map recipe errors to `400`/`504`/`502` instead of a blanket `502` - **`controller.go`**: name the conflicting owner in `resourceConflictError` - Add controller and transformer tests for the above, and document the registry-coupling / nightly-digest behavior in `docs/providers/vllm.md` - Note the new `spec.engine.image`/`extraArgs` fields and `providers/vllm` in `agents.md` - Bump `providers/vllm` dependencies (`go.mod`/`go.sum`) Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
1 parent 188fbb1 commit 8160773

12 files changed

Lines changed: 827 additions & 68 deletions

File tree

agents.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## WHY: Project Purpose
44

5-
**AI Runway** is a platform for deploying and managing machine learning models on Kubernetes. It provides a unified CRD abstraction (`ModelDeployment`) that works across multiple inference providers (KAITO, Dynamo, KubeRay, llm-d, etc.).
5+
**AI Runway** is a platform for deploying and managing machine learning models on Kubernetes. It provides a unified CRD abstraction (`ModelDeployment`) that works across multiple inference providers (KAITO, Dynamo, KubeRay, llm-d, Direct vLLM, etc.).
66

77
## WHAT: Tech Stack & Structure
88

@@ -18,6 +18,7 @@
1818
- `controller/config/` - Kustomize manifests for CRDs/RBAC
1919
- `frontend/src/` - React components, hooks, pages
2020
- `backend/src/` - Hono app, providers, services
21+
- `providers/` - Standalone provider controllers/shims (`dynamo`, `kaito`, `kuberay`, `llmd`, `vllm`); each renders `ModelDeployment` into its upstream resource. `providers/vllm` is the in-repo Direct vLLM provider (renders native `Deployment`+`Service`, selected via `provider.name: vllm`).
2122
- `shared/types/` - Shared TypeScript definitions
2223
- `plugins/headlamp/` - Headlamp dashboard plugin
2324
- `docs/` - Detailed documentation (read as needed; also the source rendered on the website)
@@ -103,7 +104,9 @@ Unified API for deploying ML models. Key fields:
103104
- `spec.model.id` - HuggingFace model ID or custom identifier
104105
- `spec.model.source` - `huggingface` or `custom`
105106
- `spec.engine.type` - `vllm`, `sglang`, `trtllm`, or `llamacpp` (optional, auto-selected from provider capabilities)
106-
- `spec.provider.name` - Optional explicit provider selection
107+
- `spec.engine.image` - Optional engine-specific container image override (preferred over legacy top-level `spec.image`; used by Direct vLLM/custom images)
108+
- `spec.engine.extraArgs` - Optional list of raw engine flags appended verbatim
109+
- `spec.provider.name` - Optional explicit provider selection (`kaito`, `dynamo`, `kuberay`, `llmd`, `vllm`)
107110
- `spec.serving.mode` - `aggregated` (default) or `disaggregated`
108111
- `spec.resources.gpu.count` - GPU count for aggregated mode
109112
- `spec.scaling.prefill/decode` - Component scaling for disaggregated mode

backend/src/routes/vllmRecipes.ts

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,34 @@ import { Hono } from 'hono';
22
import { zValidator } from '@hono/zod-validator';
33
import { z } from 'zod';
44
import { HTTPException } from 'hono/http-exception';
5+
import type { ContentfulStatusCode } from 'hono/utils/http-status';
56
import type { VllmRecipeResolveRequest } from '@airunway/shared';
6-
import { vllmRecipesClient } from '../services/vllmRecipesClient';
7+
import {
8+
vllmRecipesClient,
9+
VllmRecipeValidationError,
10+
VllmRecipeTimeoutError,
11+
} from '../services/vllmRecipesClient';
712
import { vllmRecipeResolver } from '../services/vllmRecipeResolver';
813
import logger from '../lib/logger';
914

15+
// Map recipe errors to HTTP status so callers can distinguish bad input
16+
// (4xx, do not retry) from an upstream recipes.vllm.ai outage (5xx).
17+
function recipeErrorStatus(error: unknown): ContentfulStatusCode {
18+
if (error instanceof VllmRecipeValidationError) {
19+
return 400;
20+
}
21+
if (error instanceof VllmRecipeTimeoutError) {
22+
return 504;
23+
}
24+
return 502;
25+
}
26+
27+
function recipeHttpException(error: unknown, fallbackMessage: string): HTTPException {
28+
return new HTTPException(recipeErrorStatus(error), {
29+
message: error instanceof Error ? error.message : fallbackMessage,
30+
});
31+
}
32+
1033
const imageChoiceSchema = z.discriminatedUnion('type', [
1134
z.object({ type: z.literal('recipe') }),
1235
z.object({ type: z.literal('custom'), imageRef: z.string().min(1, 'imageRef is required') }),
@@ -34,9 +57,7 @@ const vllmRecipes = new Hono()
3457
return c.json(result);
3558
} catch (error) {
3659
logger.error({ error }, 'Failed to list vLLM recipes');
37-
throw new HTTPException(502, {
38-
message: error instanceof Error ? error.message : 'Failed to list vLLM recipes',
39-
});
60+
throw recipeHttpException(error, 'Failed to list vLLM recipes');
4061
}
4162
})
4263

@@ -52,9 +73,7 @@ const vllmRecipes = new Hono()
5273
return c.json(result);
5374
} catch (error) {
5475
logger.error({ error, modelId: request.modelId }, 'Failed to resolve vLLM recipe');
55-
throw new HTTPException(502, {
56-
message: error instanceof Error ? error.message : 'Failed to resolve vLLM recipe',
57-
});
76+
throw recipeHttpException(error, 'Failed to resolve vLLM recipe');
5877
}
5978
})
6079

@@ -76,9 +95,7 @@ const vllmRecipes = new Hono()
7695
return c.json(result);
7796
} catch (error) {
7897
logger.error({ error, org, model }, 'Failed to fetch vLLM recipe');
79-
throw new HTTPException(502, {
80-
message: error instanceof Error ? error.message : 'Failed to fetch vLLM recipe',
81-
});
98+
throw recipeHttpException(error, 'Failed to fetch vLLM recipe');
8299
}
83100
});
84101

backend/src/services/vllmRecipesClient.test.ts

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
import { describe, expect, test } from 'bun:test';
2-
import { VllmRecipesClient } from './vllmRecipesClient';
2+
import {
3+
VllmRecipesClient,
4+
VllmRecipeValidationError,
5+
VllmRecipeTimeoutError,
6+
VllmRecipeUpstreamError,
7+
} from './vllmRecipesClient';
38

49
describe('VllmRecipesClient', () => {
510
test('resolves relative recipe references under the configured recipe base URL', () => {
@@ -59,5 +64,71 @@ describe('VllmRecipesClient', () => {
5964
`Invalid Hugging Face model ID: ${modelId}`
6065
);
6166
});
67+
68+
test('throws a typed validation error for bad input (mapped to 4xx by the route)', async () => {
69+
await expect(client.getByModelId('acme/foo/bar')).rejects.toBeInstanceOf(VllmRecipeValidationError);
70+
});
71+
});
72+
73+
describe('upstream error classification + caching', () => {
74+
const originalFetch = globalThis.fetch;
75+
76+
function withFetch<T>(fetchImpl: typeof fetch, run: (client: VllmRecipesClient) => Promise<T>): Promise<T> {
77+
globalThis.fetch = fetchImpl;
78+
const client = new VllmRecipesClient('https://recipes.vllm.ai');
79+
return run(client).finally(() => {
80+
globalThis.fetch = originalFetch;
81+
});
82+
}
83+
84+
test('maps an aborted fetch to a timeout error', async () => {
85+
await withFetch(
86+
(async () => {
87+
const err = new Error('aborted');
88+
err.name = 'AbortError';
89+
throw err;
90+
}) as unknown as typeof fetch,
91+
async (client) => {
92+
await expect(client.getByModelId('acme/model')).rejects.toBeInstanceOf(VllmRecipeTimeoutError);
93+
}
94+
);
95+
});
96+
97+
test('maps a non-ok upstream response to an upstream error', async () => {
98+
await withFetch(
99+
(async () => new Response('nope', { status: 503, statusText: 'Service Unavailable' })) as unknown as typeof fetch,
100+
async (client) => {
101+
await expect(client.getByModelId('acme/model')).rejects.toBeInstanceOf(VllmRecipeUpstreamError);
102+
}
103+
);
104+
});
105+
106+
test('rejects an oversized recipe payload', async () => {
107+
const huge = JSON.stringify({ blob: 'x'.repeat(6 * 1024 * 1024) });
108+
await withFetch(
109+
(async () => new Response(huge, { status: 200, headers: { 'content-type': 'application/json' } })) as unknown as typeof fetch,
110+
async (client) => {
111+
await expect(client.getByModelId('acme/model')).rejects.toBeInstanceOf(VllmRecipeUpstreamError);
112+
}
113+
);
114+
});
115+
116+
test('caches a model recipe so a second call does not refetch', async () => {
117+
let calls = 0;
118+
await withFetch(
119+
(async () => {
120+
calls += 1;
121+
return new Response(JSON.stringify({ recipe: true }), {
122+
status: 200,
123+
headers: { 'content-type': 'application/json' },
124+
});
125+
}) as unknown as typeof fetch,
126+
async (client) => {
127+
await client.getByModelId('acme/model');
128+
await client.getByModelId('acme/model');
129+
expect(calls).toBe(1);
130+
}
131+
);
132+
});
62133
});
63134
});

0 commit comments

Comments
 (0)