CopilotKit
diff --git a/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎DRIFT.md‎
Lines changed: 27 additions & 2 deletions b/‎DRIFT.md‎
Lines changed: 27 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 12 additions & 12 deletions b/‎README.md‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎docs/index.html‎
Lines changed: 4 additions & 2 deletions b/‎docs/index.html‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/__tests__/drift/helpers.ts‎
Lines changed: 80 additions & 0 deletions b/‎src/__tests__/drift/helpers.ts‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎src/__tests__/drift/models.drift.ts‎
Lines changed: 7 additions & 4 deletions b/‎src/__tests__/drift/models.drift.ts‎
Lines changed: 7 additions & 4 deletions
@@ -1,5 +1,15 @@
 # @copilotkit/llmock
 
+## 1.3.3
+
+### Patch Changes
+
+- Fix Responses WS handler to accept flat `response.create` format matching the real OpenAI API (previously required a non-standard nested `response: { ... }` envelope)
+- WebSocket drift detection tests: TLS client for real provider WS endpoints, 4 verified drift tests (Responses WS + Realtime), Gemini Live canary for text-capable model availability
+- Realtime model canary: detects when `gpt-4o-mini-realtime-preview` is deprecated and suggests GA replacement
+- Gemini Live documented as unverified (no text-capable `bidiGenerateContent` model exists yet)
+- Fix README Gemini Live response shape example (`modelTurn.parts`, not `modelTurnComplete`)
+
 ## 1.3.2
 
 ### Patch Changes
 
@@ -101,7 +101,32 @@ When a model is deprecated:
 3. Add raw fetch client functions to `src/__tests__/drift/providers.ts`
 4. Create `src/__tests__/drift/<provider>.drift.ts` with 4 test scenarios
 5. Add model listing function to `providers.ts` and model check to `models.drift.ts`
-6. Update the allowlist in `schema.ts` if needed
+6. If the provider uses WebSocket, add protocol functions to `ws-providers.ts` and create `ws-<provider>.drift.ts`
+7. Update the allowlist in `schema.ts` if needed
+
+## WebSocket Drift Coverage
+
+In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols:
+
+| Protocol            | Text | Tool Call | Real Endpoint                                                       | Status     |
+| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
+| OpenAI Responses WS | ✓    | ✓         | `wss://api.openai.com/v1/responses`                                 | Verified   |
+| OpenAI Realtime     | ✓    | ✓         | `wss://api.openai.com/v1/realtime`                                  | Verified   |
+| Gemini Live         | —    | —         | `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
+
+**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
+
+**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
+
+**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local llmock server.
+
+### Gemini Live: unverified
+
+llmock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.
+
+However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate llmock's output against a real API response.
+
+A canary test (`ws-gemini-live.drift.ts`) queries the Gemini model listing API on each drift run and checks for a non-audio model that supports `bidiGenerateContent`. When Google ships one, the canary will flag it and the full drift tests can be enabled.
 
 ## CI Schedule
 
@@ -115,4 +140,4 @@ See `.github/workflows/test-drift.yml`.
 
 ## Cost
 
-~20 API calls per run using the cheapest available models (`gpt-4o-mini`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.01/week.
+~25 API calls per run (16 HTTP response-shape + 3 model listing + 4 WS + 2 canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.02/week. When Gemini Live text-capable models become available, this will increase to 6 WS calls.
@@ -500,7 +500,7 @@ WebSocket endpoints:
 
 - **WS `/v1/responses`** — OpenAI Responses API over WebSocket
 - **WS `/v1/realtime`** — OpenAI Realtime API (text + tool calls)
-- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live
+- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live ([unverified](#gemini-live-bidigeneratecontent))
 
 All endpoints share the same fixture pool — the same fixtures work across all providers. Requests are translated to a common format internally for fixture matching.
 
@@ -518,13 +518,11 @@ Connect to `ws://localhost:5555/v1/responses` and send a `response.create` event
 // → Client sends:
 {
   "type": "response.create",
-  "response": {
-    "modalities": ["text"],
-    "instructions": "You are a helpful assistant.",
-    "input": [
-      { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] },
-    ],
-  },
+  "model": "gpt-4o",
+  "instructions": "You are a helpful assistant.",
+  "input": [
+    { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] },
+  ],
 }
 
 // ← Server streams:
@@ -567,19 +565,21 @@ Connect to `ws://localhost:5555/v1/realtime`. The Realtime API uses a session-ba
 
 ### Gemini Live (BidiGenerateContent)
 
-Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow:
+Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow.
+
+> **⚠️ Unverified**: As of March 2026, Google's only `bidiGenerateContent`-capable models are audio-only — no text-capable model exists for this endpoint. llmock implements the text-based protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live), but the response shapes have not been verified against real API output. Code you write against this mock may need adjustment when Google ships a text-capable Live model. See [DRIFT.md](DRIFT.md#gemini-live-unverified) for details and the automated canary that tracks model availability.
 
 ```jsonc
 // → Setup message (must be first):
-{ "setup": { "model": "models/gemini-2.0-flash-live", "generationConfig": { "responseModalities": ["TEXT"] } } }
+{ "setup": { "model": "models/gemini-2.5-flash", "generationConfig": { "responseModalities": ["TEXT"] } } }
 
 // → Send user content:
 { "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } }
 
 // ← Server streams:
 // {"setupComplete": {}}
-// {"serverContent": {"modelTurnComplete": false, "parts": [{"text": "Hello"}]}}
-// {"serverContent": {"modelTurnComplete": true}}
+// {"serverContent": {"modelTurn": {"parts": [{"text": "Hello"}]}, "turnComplete": false}}
+// {"serverContent": {"modelTurn": {"parts": [{"text": "!"}]}, "turnComplete": true}}
 ```
 
 ## CLI
 
@@ -1199,7 +1199,9 @@ <h3>WebSocket APIs</h3>
             <ul>
               <li>OpenAI Responses API over WebSocket</li>
               <li>OpenAI Realtime API — text + tool calls</li>
-              <li>Gemini Live BidiGenerateContent</li>
+              <li>
+                Gemini Live BidiGenerateContent (unverified — no text-capable model exists yet)
+              </li>
               <li>No audio/video — text and tool call paths only</li>
             </ul>
           </div>
@@ -1308,7 +1310,7 @@ <h2 class="section-title">llmock vs MSW</h2>
               <td class="manual">Manual — build data SSE yourself</td>
             </tr>
             <tr>
-              <td>WebSocket APIs (Realtime, Gemini Live)</td>
+              <td>WebSocket APIs (Realtime, Gemini Live*)</td>
               <td class="yes">Built-in ✓</td>
               <td class="no">No</td>
             </tr>
 
@@ -1,6 +1,6 @@
 {
   "name": "@copilotkit/llmock",
-  "version": "1.3.2",
+  "version": "1.3.3",
   "description": "Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)",
   "license": "MIT",
   "packageManager": "pnpm@10.28.2",
 
@@ -10,6 +10,12 @@
 import http from "node:http";
 import { createServer, type ServerInstance } from "../../server.js";
 import type { Fixture } from "../../types.js";
+import type { WSTestClient } from "../ws-test-client.js";
+import { extractShape, type SSEEventShape } from "./schema.js";
+
+import { classifyGeminiMessage } from "./ws-providers.js";
+
+export { classifyGeminiMessage };
 
 // ---------------------------------------------------------------------------
 // HTTP helpers
@@ -101,3 +107,77 @@ export async function startDriftServer(): Promise<ServerInstance> {
 export async function stopDriftServer(instance: ServerInstance): Promise<void> {
   await new Promise<void>((r) => instance.server.close(() => r()));
 }
+
+// ---------------------------------------------------------------------------
+// WebSocket helpers
+// ---------------------------------------------------------------------------
+
+export const GEMINI_WS_PATH =
+  "/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent";
+
+/**
+ * Collect mock WS messages until a terminal predicate fires.
+ *
+ * Uses a polling loop on waitForMessages() since ws-test-client doesn't
+ * support predicate-based collection. The `skip` parameter tells us how
+ * many messages have already been consumed so we don't re-read them.
+ *
+ * Throws if the terminal predicate never fires before the timeout expires.
+ */
+export async function collectMockWSMessages(
+  client: WSTestClient,
+  terminal: (msg: unknown) => boolean,
+  timeoutMs = 15000,
+  skip = 0,
+): Promise<{ events: SSEEventShape[]; rawMessages: unknown[] }> {
+  const rawMessages: unknown[] = [];
+  const deadline = Date.now() + timeoutMs;
+  let count = skip;
+  let terminated = false;
+
+  while (Date.now() < deadline) {
+    const nextCount = count + 1;
+    let msgs: string[];
+    try {
+      msgs = await client.waitForMessages(nextCount, Math.min(2000, deadline - Date.now()));
+    } catch (e: unknown) {
+      // Only suppress waitForMessages timeout — rethrow anything else
+      if (e instanceof Error && e.message.includes("Timeout waiting for")) {
+        if (Date.now() >= deadline) break;
+        continue;
+      }
+      throw e;
+    }
+    // Only increment count after successful receipt
+    count = nextCount;
+    const latest = msgs[count - 1];
+    let parsed: unknown;
+    try {
+      parsed = typeof latest === "string" ? JSON.parse(latest) : latest;
+    } catch {
+      throw new Error(
+        `collectMockWSMessages: failed to parse message ${count}: ${String(latest).slice(0, 200)}`,
+      );
+    }
+    rawMessages.push(parsed);
+    if (terminal(parsed)) {
+      terminated = true;
+      break;
+    }
+  }
+
+  if (!terminated) {
+    throw new Error(
+      `collectMockWSMessages timed out after ${timeoutMs}ms without terminal message. ` +
+        `Collected ${rawMessages.length} messages.`,
+    );
+  }
+
+  const events: SSEEventShape[] = rawMessages.map((msg) => {
+    const m = msg as Record<string, any>;
+    const type = m.type ?? classifyGeminiMessage(m as Record<string, unknown>);
+    return { type, dataShape: extractShape(msg) };
+  });
+
+  return { events, rawMessages };
+}
@@ -72,7 +72,7 @@ describe.skipIf(!process.env.ANTHROPIC_API_KEY)("Anthropic model availability",
     if (referenced.length === 0) return;
 
     for (const m of referenced) {
-      const found = models.some((available) => available === m || available.startsWith(`${m}`));
+      const found = models.some((available) => available === m || available.startsWith(m));
       expect(found, `Model ${m} no longer available at Anthropic`).toBe(true);
     }
   });
@@ -89,11 +89,14 @@ describe.skipIf(!process.env.GOOGLE_API_KEY)("Gemini model availability", () =>
 
     if (referenced.length === 0) return;
 
-    // Skip experimental and live-only models — they're ephemeral
-    const stable = referenced.filter((m) => !m.includes("-exp") && !m.endsWith("-live"));
+    // Skip experimental models, live-only models, and anchor-link fragments
+    // scraped from markdown (e.g., "gemini-live-bidigeneratecontent")
+    const stable = referenced.filter(
+      (m) => !m.includes("-exp") && !m.includes("-live") && !m.includes("bidigeneratecontent"),
+    );
 
     for (const m of stable) {
-      const found = models.some((available) => available === m || available.startsWith(`${m}`));
+      const found = models.some((available) => available === m || available.startsWith(m));
       expect(found, `Model ${m} no longer available at Gemini`).toBe(true);
     }
   });
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@copilotkit/llmock",`
`3`		`- "version": "1.3.2",`
	`3`	`+ "version": "1.3.3",`
`4`	`4`	`"description": "Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)",`
`5`	`5`	`"license": "MIT",`
`6`	`6`	`"packageManager": "pnpm@10.28.2",`