|
| 1 | +# Live API Drift Detection |
| 2 | + |
| 3 | +llmock produces responses shaped like real LLM APIs. Providers change their APIs over time. **Drift** means the mock no longer matches reality — your tests pass against llmock but break against the real API. |
| 4 | + |
| 5 | +## Three-Layer Approach |
| 6 | + |
| 7 | +Drift detection compares three independent sources to triangulate the cause of any mismatch: |
| 8 | + |
| 9 | +| SDK types = Real API? | Real API = llmock? | Diagnosis | |
| 10 | +| --------------------- | ------------------ | -------------------------------------------------------------------- | |
| 11 | +| Yes | No | **llmock drift** — response builders need updating | |
| 12 | +| No | No | **Provider changed before SDK update** — flag, wait for SDK catch-up | |
| 13 | +| Yes | Yes | **No drift** — all clear | |
| 14 | +| No | Yes | **SDK drift** — provider deprecated something SDK still references | |
| 15 | + |
| 16 | +Two-way comparison (mock vs real) can't distinguish between "we need to fix llmock" and "the SDK hasn't caught up yet." Three-way comparison can. |
| 17 | + |
| 18 | +## Running Drift Tests |
| 19 | + |
| 20 | +```bash |
| 21 | +# All providers (requires all three API keys) |
| 22 | +OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-... GOOGLE_API_KEY=... pnpm test:drift |
| 23 | + |
| 24 | +# Single provider (others skip automatically) |
| 25 | +OPENAI_API_KEY=sk-... pnpm test:drift |
| 26 | + |
| 27 | +# Strict mode — warnings also fail |
| 28 | +STRICT_DRIFT=1 OPENAI_API_KEY=sk-... pnpm test:drift |
| 29 | +``` |
| 30 | + |
| 31 | +Required environment variables: |
| 32 | + |
| 33 | +- `OPENAI_API_KEY` — OpenAI API key |
| 34 | +- `ANTHROPIC_API_KEY` — Anthropic API key |
| 35 | +- `GOOGLE_API_KEY` — Google AI API key |
| 36 | + |
| 37 | +Each provider's tests skip independently if its key is not set. You can run drift tests for just one provider. |
| 38 | + |
| 39 | +## Reading Results |
| 40 | + |
| 41 | +### Severity levels |
| 42 | + |
| 43 | +- **critical** — Test fails. llmock produces a different shape than the real API for a field that both the SDK and real API agree on. This means llmock needs an update. |
| 44 | +- **warning** — Test passes (unless `STRICT_DRIFT=1`). The real API has a field that neither the SDK nor llmock knows about, or the SDK and real API disagree. Usually means a provider added something new. |
| 45 | +- **info** — Always passes. Known intentional differences (usage fields are always zero, optional fields llmock omits, etc.). |
| 46 | + |
| 47 | +### Example report output |
| 48 | + |
| 49 | +``` |
| 50 | +API DRIFT DETECTED: OpenAI Chat Completions (non-streaming text) |
| 51 | +
|
| 52 | + 1. [critical] LLMOCK DRIFT — field in SDK + real API but missing from mock |
| 53 | + Path: usage.completion_tokens_details |
| 54 | + SDK: object { reasoning_tokens: number } |
| 55 | + Real: object { reasoning_tokens: number, accepted_prediction_tokens: number } |
| 56 | + Mock: <absent> |
| 57 | +
|
| 58 | + 2. [warning] PROVIDER ADDED FIELD — in real API but not in SDK or mock |
| 59 | + Path: system_fingerprint |
| 60 | + SDK: <absent> |
| 61 | + Real: string |
| 62 | + Mock: <absent> |
| 63 | +
|
| 64 | + 3. [info] MOCK EXTRA FIELD — in mock but not in real API |
| 65 | + Path: choices[0].logprobs |
| 66 | + SDK: null | object |
| 67 | + Real: <absent> |
| 68 | + Mock: null |
| 69 | +``` |
| 70 | + |
| 71 | +## Fixing Detected Drift |
| 72 | + |
| 73 | +When a `critical` drift is detected: |
| 74 | + |
| 75 | +1. **Identify the response builder** — the report path tells you which provider and field: |
| 76 | + - OpenAI Chat Completions → `src/helpers.ts` (`buildTextCompletion`, `buildToolCallCompletion`, `buildTextChunks`, `buildToolCallChunks`) |
| 77 | + - OpenAI Responses API → `src/responses.ts` (`buildTextResponse`, `buildToolCallResponse`, `buildTextStreamEvents`, `buildToolCallStreamEvents`) |
| 78 | + - Anthropic Claude → `src/messages.ts` (`buildClaudeTextResponse`, `buildClaudeToolCallResponse`, `buildClaudeTextStreamEvents`, `buildClaudeToolCallStreamEvents`) |
| 79 | + - Google Gemini → `src/gemini.ts` (`buildGeminiTextResponse`, `buildGeminiToolCallResponse`, `buildGeminiTextStreamChunks`, `buildGeminiToolCallStreamChunks`) |
| 80 | + |
| 81 | +2. **Update the builder** — add or modify the field to match the real API shape. |
| 82 | + |
| 83 | +3. **Run conformance tests** — `pnpm test` to verify existing API conformance tests still pass. |
| 84 | + |
| 85 | +4. **Run drift tests** — `pnpm test:drift` to verify the drift is resolved. |
| 86 | + |
| 87 | +## Model Deprecation |
| 88 | + |
| 89 | +The `models.drift.ts` test scrapes model names referenced in llmock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist. |
| 90 | + |
| 91 | +When a model is deprecated: |
| 92 | + |
| 93 | +1. Update the model name in the affected test files and fixtures |
| 94 | +2. Update `src/__tests__/drift/providers.ts` if the cheap test model changed |
| 95 | +3. Run `pnpm test` and `pnpm test:drift` |
| 96 | + |
| 97 | +## Adding a New Provider |
| 98 | + |
| 99 | +1. Add the provider's SDK as a devDependency in `package.json` |
| 100 | +2. Add shape extraction functions to `src/__tests__/drift/sdk-shapes.ts` |
| 101 | +3. Add raw fetch client functions to `src/__tests__/drift/providers.ts` |
| 102 | +4. Create `src/__tests__/drift/<provider>.drift.ts` with 4 test scenarios |
| 103 | +5. Add model listing function to `providers.ts` and model check to `models.drift.ts` |
| 104 | +6. Update the allowlist in `schema.ts` if needed |
| 105 | + |
| 106 | +## CI Schedule |
| 107 | + |
| 108 | +Drift tests run on a schedule: |
| 109 | + |
| 110 | +- **Weekly**: Monday 6:00 AM UTC |
| 111 | +- **Manual**: Trigger via GitHub Actions UI (`workflow_dispatch`) |
| 112 | +- **NOT** on PR or push — these tests hit real APIs and cost money |
| 113 | + |
| 114 | +See `.github/workflows/test-drift.yml`. |
| 115 | + |
| 116 | +## Cost |
| 117 | + |
| 118 | +~20 API calls per run using the cheapest available models (`gpt-4o-mini`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.01/week. |
0 commit comments