Gremlin

AI-powered test generation from real user sessions

Gremlin automatically generates comprehensive test suites by recording real user sessions and using AI to extract application behavior patterns. It supports both web applications (Playwright) and React Native apps (Maestro).

How It Works

Record - Capture real user sessions from web or mobile apps
Analyze - AI extracts state machines, flows, and properties from sessions
Generate - Automatically create Playwright or Maestro tests
Fuzz - Generate chaos tests to find edge cases and bugs

Gremlin doesn't just replay sessions — it understands your application's behavior and generates maintainable, comprehensive test suites.

Quick Start

With an AI Agent (Recommended)

The fastest way to use Gremlin. Add the MCP server to your AI editor, then let the agent handle setup, recording, and test generation.

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "gremlin": {
      "command": "bun",
      "args": ["run", "/path/to/gremlin/packages/mcp/src/index.ts"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-xxx"
      }
    }
  }
}

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "gremlin": {
      "command": "bun",
      "args": ["run", "/path/to/gremlin/packages/mcp/src/index.ts"]
    }
  }
}

Claude Code:

claude mcp add gremlin -- bun run /path/to/gremlin/packages/mcp/src/index.ts

Then ask your agent:

"Set up Gremlin in this project, record a session, and generate tests."

The agent has access to 15 tools — it can initialize your project, check status, generate tests, run them, analyze performance, and more. See MCP Deep Dive for the full tool list.

With the CLI

# Install
bun add -g @gremlin/cli

# Initialize (auto-detects framework, installs SDK, instruments entry point)
gremlin init

# Verify setup
gremlin status

# Start recording sessions
gremlin dev          # Terminal 1 - session receiver
bun run dev          # Terminal 2 - your app

Use your app for a few minutes. Gremlin captures clicks, navigation, inputs, errors, and performance automatically.

# Generate tests from your sessions
gremlin generate

# Run the tests
gremlin run

What just happened?

gremlin init detected your framework (Next.js, Vite, Remix, etc.), installed the recorder SDK, and instrumented your app's entry point
gremlin status verified your config, session directory, and AI provider key
gremlin dev started a local server that receives sessions from your app
As you used your app, sessions were recorded to .gremlin/sessions/
gremlin generate analyzed those sessions with AI and generated executable tests
gremlin run executed the generated tests

AI Provider Setup

Test generation and analysis require an API key from one of:

export ANTHROPIC_API_KEY=sk-ant-xxx   # Claude (recommended)
# or
export OPENAI_API_KEY=sk-xxx          # GPT-4o
# or
export GEMINI_API_KEY=xxx             # Gemini 2.0 Flash

Gremlin auto-detects which key is set. Verify with gremlin status.

Features

Multi-Platform Recording — Web apps via @gremlin/recorder-web (rrweb), React Native via @gremlin/recorder-react-native. Import existing rrweb recordings from PostHog, LogRocket, etc.
AI-Powered Analysis — Extracts state machines, identifies common flows and edge cases, generates property-based assertions. Supports Claude, GPT, and Gemini.
Test Generation — Playwright tests for web, Maestro flows for mobile. Grouped by user flows with descriptive comments, performance validation, and error checks.
Fuzz Testing — Random walk exploration, boundary value abuse, sequence mutation, back button chaos, rapid-fire interactions, invalid state access.
Performance Instrumentation — Web Vitals (LCP, CLS, INP, FCP, TTFB), FPS tracking, long task detection, memory monitoring. Per-event perf context. Regression testing with baselines and budgets.
AI Agent Integration — MCP server with 15 tools, --json flag on every CLI command, llms.txt generation for agent context.

Framework Support

Gremlin auto-detects and instruments:

Web: Next.js, Vite, Create React App, Remix
Mobile: Expo, React Native

MCP Server

The @gremlin/mcp package gives AI agents direct access to Gremlin via the Model Context Protocol. Setup instructions are in Quick Start above.

Tools

Tool	Description
`gremlin_status`	Project status: config, sessions, tests, AI provider
`gremlin_init`	Initialize Gremlin in the project
`gremlin_instrument_info`	Framework-specific instrumentation guidance
`gremlin_sessions_list`	List sessions with filters (app, platform, date)
`gremlin_session_get`	Get full session data by ID
`gremlin_generate_tests`	Generate Playwright/Maestro tests
`gremlin_run_tests`	Run generated tests
`gremlin_analyze`	AI insights: UX issues, errors, patterns, recommendations
`gremlin_analytics_summary`	Aggregate analytics across sessions
`gremlin_analytics_performance`	Web Vitals, FPS, memory with p50/p75/p95 percentiles
`gremlin_error_patterns`	Deduplicated error patterns with occurrence counts
`gremlin_generate_error_tests`	Generate error regression tests from session patterns
`gremlin_perf_baseline`	Snapshot performance metrics as regression baseline
`gremlin_generate_perf_tests`	Generate performance regression tests
`gremlin_run_perf_tests`	Run perf tests and compare against baseline

Resources

URI	Description
`gremlin://config`	Project configuration
`gremlin://sessions/{id}`	Session data by ID
`gremlin://spec`	Generated state machine spec
`gremlin://llms.txt`	LLM instrumentation context

Example Agent Workflows

New project setup:

gremlin_init → gremlin_status → gremlin_instrument_info

Generate tests:

gremlin_status → gremlin_sessions_list → gremlin_generate_tests → gremlin_run_tests

Performance audit:

gremlin_analytics_performance → gremlin_analyze(focus: "performance") → gremlin_perf_baseline → gremlin_generate_perf_tests

Error investigation:

gremlin_error_patterns → gremlin_analyze(focus: "errors") → gremlin_generate_error_tests → gremlin_run_tests

See packages/mcp/README.md for Windsurf setup and more details.

CLI with `--json` (Agent-Friendly)

Every CLI command supports --json for structured, machine-readable output. This is how agents that don't support MCP can still use Gremlin programmatically.

gremlin init --json
gremlin status --json
gremlin sessions --json
gremlin generate --json
gremlin analyze --json
gremlin errors --json

Output envelope:

{
  "ok": true,
  "command": "status",
  "data": { ... },
  "warnings": [],
  "meta": { "duration": 123 }
}

`llms.txt` Context

Generate project-specific instrumentation context that any AI agent can consume:

gremlin instrument --llms

This outputs your framework, SDK package, entry point location, and setup instructions — everything an agent needs to instrument your app without reading your codebase.

Generate Tests

gremlin generate

Analyzes sessions using AI and generates:

Playwright tests for web apps
Maestro flows for mobile apps
A state machine model (spec.json) of your app

Fuzz Tests

Find edge cases with chaos testing:

gremlin fuzz --strategy all --count 20

Strategies: random-walk, boundary, chaos, all (comma-separated).

Performance Testing

Gremlin captures Web Vitals, FPS, long tasks, and memory on every session:

gremlin analytics performance         # View aggregated metrics
gremlin sessions --slow               # Find poor Core Web Vitals sessions
gremlin perf-baseline                 # Set performance baseline
gremlin generate --perf               # Generate regression tests
gremlin run --perf                    # Run perf tests against baseline

Error Tracking

Find recurring error patterns and generate regression tests:

gremlin errors                        # List error patterns
gremlin errors --min-occurrences 3    # Filter by frequency
gremlin generate --errors             # Generate error regression tests

AI-Powered Analysis

Get actionable insights without generating tests:

gremlin analyze                       # Full analysis
gremlin analyze --focus ux            # UX issues only
gremlin analyze --focus errors        # Error patterns
gremlin analyze --focus performance   # Performance insights

Import Existing Sessions

Already using session recording? Import from PostHog or rrweb files:

gremlin import --posthog --api-key=phx_xxx --project-id=123 --limit=5
gremlin import --file ./recording.json
gremlin generate

Manual SDK Setup

If auto-instrumentation didn't work or you need fine-grained control:

Web Apps

bun add @gremlin/recorder-web

import { GremlinRecorder } from '@gremlin/recorder-web';

const recorder = new GremlinRecorder({
  appName: 'my-app',
  appVersion: '1.0.0',
  autoStart: true,
  capturePerformance: true,      // Web Vitals, FPS, long tasks, memory
  maskInputs: true,
  persistSession: true,
});

recorder.start();

React Native

bun add @gremlin/recorder-react-native

import { GremlinRecorder } from '@gremlin/recorder-react-native';

const recorder = new GremlinRecorder({
  appName: 'my-app',
  appVersion: '1.0.0',
  capturePerformance: true,      // FPS, memory, JS thread lag
});

recorder.start();

CI/CD Integration

# .github/workflows/gremlin.yml
name: Gremlin Tests
on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v2

      - name: Install dependencies
        run: bun install

      - name: Install Gremlin CLI
        run: bun add -g @gremlin/cli

      - name: Check Gremlin status
        run: gremlin status --json

      - name: Generate tests from recorded sessions
        run: gremlin generate --json
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Run generated tests
        run: gremlin run --json

      - name: Run performance regression tests
        run: gremlin run --perf --json

Commit .gremlin/sessions/ to the repo so CI can regenerate tests.

Self-Hosted Session API

For teams that want full data ownership. Built on Bun + Hono, stores sessions as JSON files on disk.

# Docker
docker compose up --build

# Or generate an API key and run directly
bun run --filter '@gremlin/server-node' keygen
bun run --filter '@gremlin/server-node' dev

Server runs on http://localhost:8787. All sessions stored as plain JSON in DATA_DIR/sessions.

Architecture

Recorder SDKs (@gremlin/recorder-web, @gremlin/recorder-react-native)
CLI (@gremlin/cli) — local ingest, test generation, analysis
MCP Server (@gremlin/mcp) — AI agent integration
Self-hosted API (@gremlin/server-node) — VPS/container deployments
Cloudflare Worker (@gremlin/server) — serverless deployments

Project Structure

gremlin/
├── packages/
│   ├── cli/                    # Command-line interface
│   ├── mcp/                    # MCP server for AI agents
│   ├── session/                # Session types and transport
│   ├── analysis/               # AI analysis and test generators
│   ├── ast/                    # Code-based state discovery
│   ├── recorder-web/           # Web recorder (rrweb + performance)
│   ├── recorder-react-native/  # React Native recorder
│   ├── proto/                  # Protocol buffers
│   ├── server/                 # Session server (Cloudflare Worker)
│   ├── server-node/            # Self-hosted session server (Bun)
│   └── server-shared/          # Shared server routes and types
├── examples/
│   ├── web-app/               # Example web app (Vite)
│   └── expo-app/              # Example Expo/React Native app
├── package.json               # Root workspace config
└── tsconfig.json              # TypeScript config

CLI Reference

See packages/cli/README.md for the full command reference with all options.

Development

git clone https://github.com/yourusername/gremlin.git
cd gremlin
bun install
bun test                                    # Run all tests
bun run --filter '@gremlin/cli' lint        # Type check a package

Roadmap

v0.1.0 - MVP (Current)

v0.2.0 — Chrome extension, real-time streaming, session diff/merge

v0.3.0 — Visual regression, property-based testing, test maintenance

v1.0.0 — CI/CD integrations, cloud hosting, performance optimizations

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Run tests (bun test)
Open a Pull Request

License

MIT — see LICENSE.

Acknowledgments

rrweb for web session recording
Anthropic Claude, OpenAI GPT, Google Gemini
Playwright and Maestro for test execution

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
examples		examples
packages		packages
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BOOTSTRAP.md		BOOTSTRAP.md
README.md		README.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
docker-compose.yml		docker-compose.yml
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Gremlin

How It Works

Quick Start

With an AI Agent (Recommended)

With the CLI

AI Provider Setup

Features

Framework Support

MCP Server

Tools

Resources

Example Agent Workflows

CLI with --json (Agent-Friendly)

llms.txt Context

Generate Tests

Fuzz Tests

Performance Testing

Error Tracking

AI-Powered Analysis

Import Existing Sessions

Manual SDK Setup

Web Apps

React Native

CI/CD Integration

Self-Hosted Session API

Architecture

Project Structure

CLI Reference

Development

Roadmap

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

CLI with `--json` (Agent-Friendly)

`llms.txt` Context

Packages