docs: document WS drift coverage, bump to 1.3.3

jpr5 · jpr5 · commit e5870ed35c43 · 2026-03-15T00:59:06.000-07:00
DRIFT.md: WS coverage table with verified/unverified status, Gemini
Live explanation, cost estimate (25 API calls), "Adding a New Provider"
WS step.

README.md: fix Gemini Live response shape example, update model name,
add unverified warning, fix Responses WS example to use flat format.

docs/index.html: add unverified note to Gemini Live in feature list
and comparison table.

CHANGELOG.md: 1.3.3 patch notes.
vitest.config.drift.ts: increase testTimeout to 60s for WS protocols.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,15 @@
 # @copilotkit/llmock
 
+## 1.3.3
+
+### Patch Changes
+
+- Fix Responses WS handler to accept flat `response.create` format matching the real OpenAI API (previously required a non-standard nested `response: { ... }` envelope)
+- WebSocket drift detection tests: TLS client for real provider WS endpoints, 4 verified drift tests (Responses WS + Realtime), Gemini Live canary for text-capable model availability
+- Realtime model canary: detects when `gpt-4o-mini-realtime-preview` is deprecated and suggests GA replacement
+- Gemini Live documented as unverified (no text-capable `bidiGenerateContent` model exists yet)
+- Fix README Gemini Live response shape example (`modelTurn.parts`, not `modelTurnComplete`)
+
 ## 1.3.2
 
 ### Patch Changes
diff --git a/DRIFT.md b/DRIFT.md
@@ -101,7 +101,32 @@ When a model is deprecated:
 3. Add raw fetch client functions to `src/__tests__/drift/providers.ts`
 4. Create `src/__tests__/drift/<provider>.drift.ts` with 4 test scenarios
 5. Add model listing function to `providers.ts` and model check to `models.drift.ts`
-6. Update the allowlist in `schema.ts` if needed
+6. If the provider uses WebSocket, add protocol functions to `ws-providers.ts` and create `ws-<provider>.drift.ts`
+7. Update the allowlist in `schema.ts` if needed
+
+## WebSocket Drift Coverage
+
+In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols:
+
+| Protocol            | Text | Tool Call | Real Endpoint                                                       | Status     |
+| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
+| OpenAI Responses WS | ✓    | ✓         | `wss://api.openai.com/v1/responses`                                 | Verified   |
+| OpenAI Realtime     | ✓    | ✓         | `wss://api.openai.com/v1/realtime`                                  | Verified   |
+| Gemini Live         | —    | —         | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
+
+**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
+
+**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
+
+**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local llmock server.
+
+### Gemini Live: unverified
+
+llmock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.
+
+However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate llmock's output against a real API response.
+
+A canary test (`ws-gemini-live.drift.ts`) queries the Gemini model listing API on each drift run and checks for a non-audio model that supports `bidiGenerateContent`. When Google ships one, the canary will flag it and the full drift tests can be enabled.
 
 ## CI Schedule
 
@@ -115,4 +140,4 @@ See `.github/workflows/test-drift.yml`.
 
 ## Cost
 
-~20 API calls per run using the cheapest available models (`gpt-4o-mini`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.01/week.
+~25 API calls per run (16 HTTP response-shape + 3 model listing + 4 WS + 2 canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.02/week. When Gemini Live text-capable models become available, this will increase to 6 WS calls.
diff --git a/README.md b/README.md
@@ -500,7 +500,7 @@ WebSocket endpoints:
 
 - **WS `/v1/responses`** — OpenAI Responses API over WebSocket
 - **WS `/v1/realtime`** — OpenAI Realtime API (text + tool calls)
-- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live
+- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live ([unverified](#gemini-live-bidigeneratecontent))
 
 All endpoints share the same fixture pool — the same fixtures work across all providers. Requests are translated to a common format internally for fixture matching.
 
@@ -518,13 +518,11 @@ Connect to `ws://localhost:5555/v1/responses` and send a `response.create` event
 // → Client sends:
 {
   "type": "response.create",
-  "response": {
-    "modalities": ["text"],
-    "instructions": "You are a helpful assistant.",
-    "input": [
-      { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] },
-    ],
-  },
+  "model": "gpt-4o",
+  "instructions": "You are a helpful assistant.",
+  "input": [
+    { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] },
+  ],
 }
 
 // ← Server streams:
@@ -567,19 +565,21 @@ Connect to `ws://localhost:5555/v1/realtime`. The Realtime API uses a session-ba
 
 ### Gemini Live (BidiGenerateContent)
 
-Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow:
+Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow.
+
+> **⚠️ Unverified**: As of March 2026, Google's only `bidiGenerateContent`-capable models are audio-only — no text-capable model exists for this endpoint. llmock implements the text-based protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live), but the response shapes have not been verified against real API output. Code you write against this mock may need adjustment when Google ships a text-capable Live model. See [DRIFT.md](DRIFT.md#gemini-live-unverified) for details and the automated canary that tracks model availability.
 
 ```jsonc
 // → Setup message (must be first):
-{ "setup": { "model": "models/gemini-2.0-flash-live", "generationConfig": { "responseModalities": ["TEXT"] } } }
+{ "setup": { "model": "models/gemini-2.5-flash", "generationConfig": { "responseModalities": ["TEXT"] } } }
 
 // → Send user content:
 { "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } }
 
 // ← Server streams:
 // {"setupComplete": {}}
-// {"serverContent": {"modelTurnComplete": false, "parts": [{"text": "Hello"}]}}
-// {"serverContent": {"modelTurnComplete": true}}
+// {"serverContent": {"modelTurn": {"parts": [{"text": "Hello"}]}, "turnComplete": false}}
+// {"serverContent": {"modelTurn": {"parts": [{"text": "!"}]}, "turnComplete": true}}
 ```
 
 ## CLI
diff --git a/docs/index.html b/docs/index.html
@@ -1199,7 +1199,9 @@ <h3>WebSocket APIs</h3>
             <ul>
               <li>OpenAI Responses API over WebSocket</li>
               <li>OpenAI Realtime API — text + tool calls</li>
-              <li>Gemini Live BidiGenerateContent</li>
+              <li>
+                Gemini Live BidiGenerateContent (unverified — no text-capable model exists yet)
+              </li>
               <li>No audio/video — text and tool call paths only</li>
             </ul>
           </div>
@@ -1308,7 +1310,7 @@ <h2 class="section-title">llmock vs MSW</h2>
               <td class="manual">Manual — build data SSE yourself</td>
             </tr>
             <tr>
-              <td>WebSocket APIs (Realtime, Gemini Live)</td>
+              <td>WebSocket APIs (Realtime, Gemini Live*)</td>
               <td class="yes">Built-in ✓</td>
               <td class="no">No</td>
             </tr>
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@copilotkit/llmock",
-  "version": "1.3.2",
+  "version": "1.3.3",
   "description": "Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)",
   "license": "MIT",
   "packageManager": "pnpm@10.28.2",
diff --git a/vitest.config.drift.ts b/vitest.config.drift.ts
@@ -4,6 +4,6 @@ export default defineConfig({
     environment: "node",
     globals: true,
     include: ["src/__tests__/drift/**/*.drift.ts"],
-    testTimeout: 30000,
+    testTimeout: 60000,
   },
 });

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@copilotkit/llmock",`
`3`		`- "version": "1.3.2",`
	`3`	`+ "version": "1.3.3",`
`4`	`4`	`"description": "Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)",`
`5`	`5`	`"license": "MIT",`
`6`	`6`	`"packageManager": "pnpm@10.28.2",`