Skip to content

Commit 206bc45

Browse files
authored
merge: Provider Endpoints, Chaos, Metrics, Record-and-Replay (#53)
## Summary Major feature release adding 8 capabilities to llmock, plus 29 bugs found and fixed in code review. ### Provider Endpoints - **Bedrock Streaming** — invoke-with-response-stream (AWS Event Stream binary) + Converse API - **Vertex AI** — Routes to existing Gemini handler - **Ollama** — /api/chat, /api/generate, /api/tags (NDJSON streaming) - **Cohere** — /v2/chat (typed SSE events) ### Infrastructure - **Chaos Testing** — Probabilistic drop/malformed/disconnect, three precedence levels (header > fixture > server), rate clamping to [0,1] - **Prometheus Metrics** — Opt-in /metrics, counters, cumulative histograms, gauges ### Record-and-Replay - **Proxy-on-miss** — Real API responses saved as fixtures with 30s upstream timeout - **Stream collapsing** — 6 functions (SSE, NDJSON, EventStream) supporting both Converse and Messages formats - **Strict mode (503)** — Catch missing fixtures in CI - **Auth safety** — Forwarded but redacted in journal, never in fixtures ### Quality - **1250 tests** across 37 files - 7 rounds of 7-agent code review, 29 bugs found and fixed - Build/format/lint clean, zero external dependencies, zero as-any in source ## Review Fixes (29 total across 7 rounds) ### Round 1: Original review (20 findings) - HandlerDefaults type extracted, fixing silent undefined access in 5 handlers - Provider-specific error formats (Anthropic, Gemini, Bedrock) - Recorder binary relay corruption (UTF-8 round-trip on EventStream) - collapseOllamaNDJSON tool_calls + buildFixtureResponse priority - ChaosAction dedup, RecordProviderKey union, OllamaMessage.role union - collapseCohereSSE naming, chaos rate clamping, recorder auth comment - SKILL.md 503 status, warn log level, README provider list, types.ts header ### Round 2 (2 findings) - applyChaos registry argument missing in 5 handlers (chaos metrics incomplete) - Bedrock Converse response format missing in buildFixtureResponse ### Round 5 — fresh context (2 findings) - Global recordCounter → crypto.randomUUID() (concurrent test determinism) - rawBody pass-through in OpenAI completions proxy path ### Round 6 — fresh context (2 findings) - 30s upstream timeout in makeUpstreamRequest (prevents indefinite hangs) - collapseBedrockEventStream: handle both Converse (camelCase) and Messages (flat type) formats ### Round 7 — fresh context (3 findings) - new URL() validation with specific 502 error for malformed provider URLs - writtenToDisk flag to prevent misleading "Response recorded" log on write failure - res.on("error") handler for upstream response stream mid-transfer drops All fixes have corresponding regression tests.
2 parents d3eebc2 + c694c9b commit 206bc45

71 files changed

Lines changed: 17988 additions & 322 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# @copilotkit/llmock
22

3+
## 1.6.0
4+
5+
### Minor Changes
6+
7+
- Provider-specific endpoints: dedicated routes for Bedrock (`/model/{modelId}/invoke`), Ollama (`/api/chat`, `/api/generate`), Cohere (`/v2/chat`), and Azure OpenAI deployment-based routing (`/openai/deployments/{id}/chat/completions`)
8+
- Chaos injection: `ChaosConfig` type with `drop`, `malformed`, and `disconnect` actions; supports per-fixture chaos via `chaos` config on each fixture and server-wide chaos via `--chaos-drop`, `--chaos-malformed`, and `--chaos-disconnect` CLI flags
9+
- Metrics: `GET /metrics` endpoint exposing Prometheus text format with request counters and latency histograms per provider and route
10+
- Record-and-replay: `--record` flag and `proxyAndRecord` helper that proxies requests to real LLM APIs, collapses streaming responses, and writes fixture JSON to disk for future playback
11+
312
## 1.5.1
413

514
### Patch Changes

README.md

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [![npm version](https://img.shields.io/npm/v/@copilotkit/llmock)](https://www.npmjs.com/package/@copilotkit/llmock)
22

3-
Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, and Azure API formats, driven entirely by fixtures. Zero runtime dependencies.
3+
Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
44

55
## Quick Start
66

@@ -45,7 +45,7 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
4545
**Use llmock when:**
4646

4747
- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
48-
- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini)
48+
- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
4949
- You prefer defining fixtures as JSON files rather than code
5050
- You need a standalone CLI server
5151

@@ -72,17 +72,20 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
7272

7373
## Features
7474

75-
- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)**[OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html)
75+
- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)**[OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
7676
- **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
7777
- **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)**`response_format`, `json_schema`, and function calling
7878
- **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
7979
- **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
8080
- **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
8181
- **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
82+
- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
83+
- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
8284
- **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
8385
- **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
8486
- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
8587
- **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
88+
- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
8689
- **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
8790
- **Claude Code integration**`/write-fixtures` skill teaches your AI assistant how to write fixtures correctly
8891

@@ -92,17 +95,24 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
9295
llmock [options]
9396
```
9497

95-
| Option | Short | Default | Description |
96-
| -------------------- | ----- | ------------ | ----------------------------------------- |
97-
| `--port` | `-p` | `4010` | Port to listen on |
98-
| `--host` | `-h` | `127.0.0.1` | Host to bind to |
99-
| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
100-
| `--latency` | `-l` | `0` | Latency between SSE chunks (ms) |
101-
| `--chunk-size` | `-c` | `20` | Characters per SSE chunk |
102-
| `--watch` | `-w` | | Watch fixture path for changes and reload |
103-
| `--log-level` | | `info` | Log verbosity: `silent`, `info`, `debug` |
104-
| `--validate-on-load` | | | Validate fixture schemas at startup |
105-
| `--help` | | | Show help |
98+
| Option | Short | Default | Description |
99+
| -------------------- | ----- | ------------ | ------------------------------------------- |
100+
| `--port` | `-p` | `4010` | Port to listen on |
101+
| `--host` | `-h` | `127.0.0.1` | Host to bind to |
102+
| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
103+
| `--latency` | `-l` | `0` | Latency between SSE chunks (ms) |
104+
| `--chunk-size` | `-c` | `20` | Characters per SSE chunk |
105+
| `--watch` | `-w` | | Watch fixture path for changes and reload |
106+
| `--log-level` | | `info` | Log verbosity: `silent`, `info`, `debug` |
107+
| `--validate-on-load` | | | Validate fixture schemas at startup |
108+
| `--chaos-drop` | | `0` | Chaos: probability of 500 errors (0-1) |
109+
| `--chaos-malformed` | | `0` | Chaos: probability of malformed JSON (0-1) |
110+
| `--chaos-disconnect` | | `0` | Chaos: probability of disconnect (0-1) |
111+
| `--metrics` | | | Enable Prometheus metrics at /metrics |
112+
| `--record` | | | Record mode: proxy unmatched to real APIs |
113+
| `--strict` | | | Strict mode: fail on unmatched requests |
114+
| `--provider-*` | | | Upstream URL per provider (with `--record`) |
115+
| `--help` | | | Show help |
106116

107117
```bash
108118
# Start with bundled example fixtures
@@ -113,6 +123,12 @@ llmock -p 8080 -f ./my-fixtures
113123

114124
# Simulate slow responses
115125
llmock --latency 100 --chunk-size 5
126+
127+
# Record mode: proxy unmatched requests to real APIs and save as fixtures
128+
llmock --record --provider-openai https://api.openai.com --provider-anthropic https://api.anthropic.com
129+
130+
# Strict mode in CI: fail if any request doesn't match a fixture
131+
llmock --strict -f ./fixtures
116132
```
117133

118134
## Documentation

charts/llmock/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ name: llmock
33
description: Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)
44
type: application
55
version: 0.1.0
6-
appVersion: "1.4.0"
6+
appVersion: "1.6.0"

0 commit comments

Comments
 (0)