Skip to content

Commit 9cdb64b

Browse files
authored
Migrate Realtime API to GA protocol with Beta compatibility shim (#189)
## Summary - Migrate aimock's Realtime API handler from deprecated Beta protocol to GA (event renames, nested session config, content type renames) - Add Beta compatibility shim via `OpenAI-Beta: realtime=v1` header detection — thin translation layer in `sendEvent()` wrapper - Support 5 new GA models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper - Add translate/whisper session types with audio buffer handling and model+type validation - Add image input support (`input_image` content parts mapped to ChatMessage `image_url` format) - Add async function calling commentary phase (`phase` field on output_item events) - Add `conversation.item.done` event, `response.cancel` handling, endpoint type routing - Extend drift detection with GA model canary (5 models) and protocol probe (GA vs Beta normalization) - Update docs, DRIFT.md, competitive matrix ## Test plan - [x] 2965 tests pass (73 realtime integration tests, 66 conformance tests, drift tests gated behind API key) - [x] TypeScript clean (`tsc --noEmit`) - [x] Lint + format clean - [x] CR converged in 2 rounds (7 agents × 2 = 14 review agents, 4 findings fixed) - [ ] Drift tests against live OpenAI API (requires API key) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2 parents 8bb6bae + 38157e6 commit 9cdb64b

17 files changed

Lines changed: 2669 additions & 401 deletions

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@
77
- **Model-aware fixture recording** — recorded fixtures now include the model name in match criteria, preventing collisions when an app makes multiple LLM calls with the same user message but different models. Model names are normalized by stripping date/version suffixes (e.g., `claude-opus-4-20250514``claude-opus-4`) so fixtures survive version bumps. Disable with `recordFullModelVersion: true`. ([#185](https://github.com/CopilotKit/aimock/issues/185))
88
- **Drift detection metadata** — recorded fixtures include `systemHash` and `toolsHash` in a `metadata` block for detecting system prompt or tool definition changes since recording.
99
- **Prefix model matching** — fixture router uses `startsWith` for string model matching, so `model: "claude-opus-4"` matches any `claude-opus-4-*` version.
10+
- **GA Realtime protocol migration with Beta compatibility shim** — handler emits GA event names natively; `sendEvent()` wrapper translates back for Beta clients detected via `OpenAI-Beta` header. Default model changed to `gpt-realtime-2`.
11+
- **5 new GA Realtime models**`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`.
12+
- **Translate and whisper session types** — dedicated session configurations for translation and transcription workloads on the Realtime API.
13+
- **Image input support** — Realtime sessions accept image content parts alongside text and audio.
14+
- **Commentary phase** — Realtime handler supports the GA commentary phase for model-generated annotations.
15+
- **`conversation.item.done` and `response.cancel` events** — new GA Realtime event types for item completion tracking and response cancellation.
16+
- **Endpoint type routing for Realtime** — router distinguishes GA vs Beta Realtime endpoints for fixture matching.
17+
- **Drift detection for GA Realtime** — drift test suite extended with GA protocol shapes, Beta conformance shapes, and three-way triangulation.
18+
19+
### Tests
20+
21+
- **73 GA Realtime integration tests** — comprehensive test coverage for all GA event types, Beta compatibility, session management, model routing, image input, translate/whisper, commentary, and cancellation.
22+
- **GA and Beta Realtime conformance suites** — API conformance tests validating event shapes against both GA and Beta protocol specs.
23+
- **GA Realtime drift detection** — SDK shape tests and provider triangulation for the GA Realtime protocol.
1024

1125
## [1.22.1] - 2026-05-12
1226

DRIFT.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ When a model is deprecated:
107107

108108
## WebSocket Drift Coverage
109109

110-
In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
110+
In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
111111

112112
### Gemini Interactions API (Beta)
113113

@@ -120,13 +120,20 @@ The Gemini Interactions API (`/v1beta/interactions`) is covered by 4 drift tests
120120

121121
Uses `describe.skipIf(!GOOGLE_API_KEY)` like other Gemini tests. The Interactions API is in Beta — shapes may shift as Google iterates on the endpoint.
122122

123-
| Protocol | Text | Tool Call | Real Endpoint | Status |
124-
| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
125-
| OpenAI Responses WS ||| `wss://api.openai.com/v1/responses` | Verified |
126-
| OpenAI Realtime ||| `wss://api.openai.com/v1/realtime` | Verified |
127-
| Gemini Live ||| `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
123+
| Protocol | Text | Tool Call | Real Endpoint | Status |
124+
| ---------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
125+
| OpenAI Responses WS ||| `wss://api.openai.com/v1/responses` | Verified |
126+
| OpenAI Realtime (GA) ||| `wss://api.openai.com/v1/realtime` | Verified |
127+
| OpenAI Realtime (Beta) ||| `wss://api.openai.com/v1/realtime` + `OpenAI-Beta: realtime=v1` | Verified |
128+
| Gemini Live ||| `wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent` | Unverified |
128129

129-
**Models**: `gpt-4o-mini` for Responses WS, `gpt-4o-mini-realtime-preview` for Realtime.
130+
**Models**: `gpt-4o-mini` for Responses WS, `gpt-realtime-2` for Realtime GA (was `gpt-4o-mini-realtime-preview`).
131+
132+
**GA Realtime Drift Tests**:
133+
134+
- **Model canary** — Verifies all 5 GA models exist (`gpt-realtime-2`, `gpt-realtime-1.5`, `gpt-realtime-mini`, `gpt-realtime-translate`, `gpt-realtime-whisper`) and flags unknown realtime models
135+
- **Protocol probe** — Connects with both GA and Beta protocol, normalizes event sequences, and verifies consistency
136+
- **Event shape validation** — GA event names (`response.output_text.delta`, `conversation.item.added`, `conversation.item.done`) and nested session config (`session.audio.*`, `session.type`, `session.reasoning`)
130137

131138
**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
132139

@@ -175,4 +182,4 @@ The fix workflow also supports `workflow_dispatch` for manual runs.
175182

176183
## Cost
177184

178-
~29 API calls per run (20 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.20/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.
185+
~31 API calls per run (20 HTTP response-shape + 3 model listing + 8 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-realtime-2`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.25/week at daily cadence. The GA protocol probe adds a second Realtime WS connection (one GA, one Beta) per run. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 6 to 8.

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,29 +35,29 @@ await mock.stop();
3535

3636
aimock mocks everything your AI app talks to:
3737

38-
| Tool | What it mocks | Docs |
39-
| -------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
40-
| **LLMock** | OpenAI (Chat/Responses/Realtime), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
41-
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
42-
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
43-
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
44-
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
45-
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
38+
| Tool | What it mocks | Docs |
39+
| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
40+
| **LLMock** | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs) |
41+
| **MCPMock** | MCP tools, resources, prompts with session management | [MCP](https://aimock.copilotkit.dev/mcp-mock) |
42+
| **A2AMock** | Agent-to-agent protocol with SSE streaming | [A2A](https://aimock.copilotkit.dev/a2a-mock) |
43+
| **AGUIMock** | AG-UI agent-to-UI event streams for frontend testing | [AG-UI](https://aimock.copilotkit.dev/agui-mock) |
44+
| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints | [Vector](https://aimock.copilotkit.dev/vector-mock) |
45+
| **Services** | Tavily search, Cohere rerank, OpenAI moderation | [Services](https://aimock.copilotkit.dev/services) |
4646

4747
Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.
4848

4949
## Features
5050

5151
- **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
5252
- **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
53-
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime, Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
53+
- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
5454
- **Multimedia APIs**[image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video)
5555
- **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
5656
- **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
5757
- **Per-Request Strict Mode**`X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
5858
- **[Drift Detection](https://aimock.copilotkit.dev/drift-detection)** — Daily CI validation against real APIs
5959
- **[Streaming Physics](https://aimock.copilotkit.dev/streaming-physics)** — Configurable `ttft`, `tps`, and `jitter`
60-
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime, Responses WS, Gemini Live
60+
- **[WebSocket APIs](https://aimock.copilotkit.dev/websocket)** — OpenAI Realtime (GA protocol with 5 models: gpt-realtime-2, gpt-realtime-1.5, gpt-realtime-mini, gpt-realtime-translate, gpt-realtime-whisper; transcription/translation session types; image input; commentary phase), Responses WS, Gemini Live
6161
- **[Prometheus Metrics](https://aimock.copilotkit.dev/metrics)** — Request counts, latencies, fixture match rates
6262
- **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
6363
- **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching

0 commit comments

Comments
 (0)