AI-powered test generation from real user sessions
Gremlin automatically generates comprehensive test suites by recording real user sessions and using AI to extract application behavior patterns. It supports both web applications (Playwright) and React Native apps (Maestro).
- Record - Capture real user sessions from web or mobile apps
- Analyze - AI extracts state machines, flows, and properties from sessions
- Generate - Automatically create Playwright or Maestro tests
- Fuzz - Generate chaos tests to find edge cases and bugs
Gremlin doesn't just replay sessions — it understands your application's behavior and generates maintainable, comprehensive test suites.
The fastest way to use Gremlin. Add the MCP server to your AI editor, then let the agent handle setup, recording, and test generation.
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"gremlin": {
"command": "bun",
"args": ["run", "/path/to/gremlin/packages/mcp/src/index.ts"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-xxx"
}
}
}
}Cursor — add to .cursor/mcp.json:
{
"mcpServers": {
"gremlin": {
"command": "bun",
"args": ["run", "/path/to/gremlin/packages/mcp/src/index.ts"]
}
}
}Claude Code:
claude mcp add gremlin -- bun run /path/to/gremlin/packages/mcp/src/index.tsThen ask your agent:
"Set up Gremlin in this project, record a session, and generate tests."
The agent has access to 15 tools — it can initialize your project, check status, generate tests, run them, analyze performance, and more. See MCP Deep Dive for the full tool list.
# Install
bun add -g @gremlin/cli
# Initialize (auto-detects framework, installs SDK, instruments entry point)
gremlin init
# Verify setup
gremlin status
# Start recording sessions
gremlin dev # Terminal 1 - session receiver
bun run dev # Terminal 2 - your appUse your app for a few minutes. Gremlin captures clicks, navigation, inputs, errors, and performance automatically.
# Generate tests from your sessions
gremlin generate
# Run the tests
gremlin runWhat just happened?
gremlin initdetected your framework (Next.js, Vite, Remix, etc.), installed the recorder SDK, and instrumented your app's entry pointgremlin statusverified your config, session directory, and AI provider keygremlin devstarted a local server that receives sessions from your app- As you used your app, sessions were recorded to
.gremlin/sessions/ gremlin generateanalyzed those sessions with AI and generated executable testsgremlin runexecuted the generated tests
Test generation and analysis require an API key from one of:
export ANTHROPIC_API_KEY=sk-ant-xxx # Claude (recommended)
# or
export OPENAI_API_KEY=sk-xxx # GPT-4o
# or
export GEMINI_API_KEY=xxx # Gemini 2.0 FlashGremlin auto-detects which key is set. Verify with gremlin status.
-
Multi-Platform Recording — Web apps via
@gremlin/recorder-web(rrweb), React Native via@gremlin/recorder-react-native. Import existing rrweb recordings from PostHog, LogRocket, etc. -
AI-Powered Analysis — Extracts state machines, identifies common flows and edge cases, generates property-based assertions. Supports Claude, GPT, and Gemini.
-
Test Generation — Playwright tests for web, Maestro flows for mobile. Grouped by user flows with descriptive comments, performance validation, and error checks.
-
Fuzz Testing — Random walk exploration, boundary value abuse, sequence mutation, back button chaos, rapid-fire interactions, invalid state access.
-
Performance Instrumentation — Web Vitals (LCP, CLS, INP, FCP, TTFB), FPS tracking, long task detection, memory monitoring. Per-event perf context. Regression testing with baselines and budgets.
-
AI Agent Integration — MCP server with 15 tools,
--jsonflag on every CLI command,llms.txtgeneration for agent context.
Gremlin auto-detects and instruments:
- Web: Next.js, Vite, Create React App, Remix
- Mobile: Expo, React Native
The @gremlin/mcp package gives AI agents direct access to Gremlin via the Model Context Protocol. Setup instructions are in Quick Start above.
| Tool | Description |
|---|---|
gremlin_status |
Project status: config, sessions, tests, AI provider |
gremlin_init |
Initialize Gremlin in the project |
gremlin_instrument_info |
Framework-specific instrumentation guidance |
gremlin_sessions_list |
List sessions with filters (app, platform, date) |
gremlin_session_get |
Get full session data by ID |
gremlin_generate_tests |
Generate Playwright/Maestro tests |
gremlin_run_tests |
Run generated tests |
gremlin_analyze |
AI insights: UX issues, errors, patterns, recommendations |
gremlin_analytics_summary |
Aggregate analytics across sessions |
gremlin_analytics_performance |
Web Vitals, FPS, memory with p50/p75/p95 percentiles |
gremlin_error_patterns |
Deduplicated error patterns with occurrence counts |
gremlin_generate_error_tests |
Generate error regression tests from session patterns |
gremlin_perf_baseline |
Snapshot performance metrics as regression baseline |
gremlin_generate_perf_tests |
Generate performance regression tests |
gremlin_run_perf_tests |
Run perf tests and compare against baseline |
| URI | Description |
|---|---|
gremlin://config |
Project configuration |
gremlin://sessions/{id} |
Session data by ID |
gremlin://spec |
Generated state machine spec |
gremlin://llms.txt |
LLM instrumentation context |
New project setup:
gremlin_init → gremlin_status → gremlin_instrument_info
Generate tests:
gremlin_status → gremlin_sessions_list → gremlin_generate_tests → gremlin_run_tests
Performance audit:
gremlin_analytics_performance → gremlin_analyze(focus: "performance") → gremlin_perf_baseline → gremlin_generate_perf_tests
Error investigation:
gremlin_error_patterns → gremlin_analyze(focus: "errors") → gremlin_generate_error_tests → gremlin_run_tests
See packages/mcp/README.md for Windsurf setup and more details.
Every CLI command supports --json for structured, machine-readable output. This is how agents that don't support MCP can still use Gremlin programmatically.
gremlin init --json
gremlin status --json
gremlin sessions --json
gremlin generate --json
gremlin analyze --json
gremlin errors --jsonOutput envelope:
{
"ok": true,
"command": "status",
"data": { ... },
"warnings": [],
"meta": { "duration": 123 }
}Generate project-specific instrumentation context that any AI agent can consume:
gremlin instrument --llmsThis outputs your framework, SDK package, entry point location, and setup instructions — everything an agent needs to instrument your app without reading your codebase.
gremlin generateAnalyzes sessions using AI and generates:
- Playwright tests for web apps
- Maestro flows for mobile apps
- A state machine model (
spec.json) of your app
Find edge cases with chaos testing:
gremlin fuzz --strategy all --count 20Strategies: random-walk, boundary, chaos, all (comma-separated).
Gremlin captures Web Vitals, FPS, long tasks, and memory on every session:
gremlin analytics performance # View aggregated metrics
gremlin sessions --slow # Find poor Core Web Vitals sessions
gremlin perf-baseline # Set performance baseline
gremlin generate --perf # Generate regression tests
gremlin run --perf # Run perf tests against baselineFind recurring error patterns and generate regression tests:
gremlin errors # List error patterns
gremlin errors --min-occurrences 3 # Filter by frequency
gremlin generate --errors # Generate error regression testsGet actionable insights without generating tests:
gremlin analyze # Full analysis
gremlin analyze --focus ux # UX issues only
gremlin analyze --focus errors # Error patterns
gremlin analyze --focus performance # Performance insightsAlready using session recording? Import from PostHog or rrweb files:
gremlin import --posthog --api-key=phx_xxx --project-id=123 --limit=5
gremlin import --file ./recording.json
gremlin generateIf auto-instrumentation didn't work or you need fine-grained control:
bun add @gremlin/recorder-webimport { GremlinRecorder } from '@gremlin/recorder-web';
const recorder = new GremlinRecorder({
appName: 'my-app',
appVersion: '1.0.0',
autoStart: true,
capturePerformance: true, // Web Vitals, FPS, long tasks, memory
maskInputs: true,
persistSession: true,
});
recorder.start();bun add @gremlin/recorder-react-nativeimport { GremlinRecorder } from '@gremlin/recorder-react-native';
const recorder = new GremlinRecorder({
appName: 'my-app',
appVersion: '1.0.0',
capturePerformance: true, // FPS, memory, JS thread lag
});
recorder.start();# .github/workflows/gremlin.yml
name: Gremlin Tests
on: [push]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
- name: Install dependencies
run: bun install
- name: Install Gremlin CLI
run: bun add -g @gremlin/cli
- name: Check Gremlin status
run: gremlin status --json
- name: Generate tests from recorded sessions
run: gremlin generate --json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run generated tests
run: gremlin run --json
- name: Run performance regression tests
run: gremlin run --perf --jsonCommit .gremlin/sessions/ to the repo so CI can regenerate tests.
For teams that want full data ownership. Built on Bun + Hono, stores sessions as JSON files on disk.
# Docker
docker compose up --build
# Or generate an API key and run directly
bun run --filter '@gremlin/server-node' keygen
bun run --filter '@gremlin/server-node' devServer runs on http://localhost:8787. All sessions stored as plain JSON in DATA_DIR/sessions.
- Recorder SDKs (
@gremlin/recorder-web,@gremlin/recorder-react-native) - CLI (
@gremlin/cli) — local ingest, test generation, analysis - MCP Server (
@gremlin/mcp) — AI agent integration - Self-hosted API (
@gremlin/server-node) — VPS/container deployments - Cloudflare Worker (
@gremlin/server) — serverless deployments
gremlin/
├── packages/
│ ├── cli/ # Command-line interface
│ ├── mcp/ # MCP server for AI agents
│ ├── session/ # Session types and transport
│ ├── analysis/ # AI analysis and test generators
│ ├── ast/ # Code-based state discovery
│ ├── recorder-web/ # Web recorder (rrweb + performance)
│ ├── recorder-react-native/ # React Native recorder
│ ├── proto/ # Protocol buffers
│ ├── server/ # Session server (Cloudflare Worker)
│ ├── server-node/ # Self-hosted session server (Bun)
│ └── server-shared/ # Shared server routes and types
├── examples/
│ ├── web-app/ # Example web app (Vite)
│ └── expo-app/ # Example Expo/React Native app
├── package.json # Root workspace config
└── tsconfig.json # TypeScript config
See packages/cli/README.md for the full command reference with all options.
git clone https://github.com/yourusername/gremlin.git
cd gremlin
bun install
bun test # Run all tests
bun run --filter '@gremlin/cli' lint # Type check a packagev0.1.0 - MVP (Current)
- Web recorder with rrweb + performance instrumentation
- React Native recorder (gestures, performance, navigation)
- AI-powered flow analysis (Claude, GPT, Gemini)
- Playwright + Maestro test generation
- Fuzz test generation
- CLI with
--jsonoutput - MCP server for AI agents
- PostHog import
- Session replay viewer
- Performance + error regression testing
v0.2.0 — Chrome extension, real-time streaming, session diff/merge
v0.3.0 — Visual regression, property-based testing, test maintenance
v1.0.0 — CI/CD integrations, cloud hosting, performance optimizations
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests (
bun test) - Open a Pull Request
MIT — see LICENSE.
- rrweb for web session recording
- Anthropic Claude, OpenAI GPT, Google Gemini
- Playwright and Maestro for test execution