mcp-loadtest

Load tester and bug detector for MCP (Model Context Protocol) servers. Catches lazy-init deadlocks, concurrency races, hangs, and perf regressions that unit tests miss.

Why mcp-loadtest

Lazy-init inside an async worker thread is one of the easiest ways to ship a broken MCP server. initialize works, tools/list works, the first tools/call hangs forever. Standard pytest never sees it because the bug only surfaces when a real client opens a session and drives the protocol end-to-end.

The flagship example is HKUDS/Vibe-Trading PR #85 — _get_registry() blocked on a deferred import src.tools.shell.* inside FastMCP's worker thread, so every concurrent caller wedged on the same lock. The fix was five lines. Finding the bug took hours of differential testing.

mcp-loadtest finds it in seconds:

$ mcp-loadtest deadlock-probe --server "python -m vibe_trading_mcp" \
    --tool analyze_options --concurrent 5 \
    --args '{"spot":450,"strike":460,"expiry_days":30}'

Run 01KR9JX7E4P638TKQM96YA0B4Z
Status: FAIL (1 deadlock)
Server: python -m vibe_trading_mcp
Scenario: deadlock_probe
Deadlocks: 1   Hangs: 0   Errors: 0

Error: DEADLOCK DETECTED — 1 deadlock(s), 0 error(s), 0 threshold violation(s)
$ echo $?
1

This is the bug class that breaks MCP servers in production. Unit tests don't catch it. mcp-loadtest does — and exits non-zero so it can fail your CI gate. The regression that catches the exact Vibe-Trading commit lives at crates/mcp-loadtest/tests/vibe_trading_regression.rs (pinned to commit 71220c7c — the parent of PR #85).

What it does

Bug-class detection
- deadlock_probe — fires N concurrent tools/calls through hang_detect; classifies each as success / slow / deadlock (see DESIGN.md §15.2).
- race_check — issues identical calls and diffs the responses to surface non-determinism (clocks, RNG, leaked state).
- Hang detector watchdog wraps every call, so any scenario can surface a hung tool, not just the dedicated probe.
Load testing
- sustained — constant concurrency over a duration; baseline p50/p95/p99/p999 + throughput.
- ramp — linear ramp of concurrency to find the break-point.
- soak — long-running sustained load with periodic RSS sampling for leak hunting.
- Cold-start measurement, weighted pattern mixes (explore-then-act, multi-step).
Reporting
- Markdown report at runs/<ulid>/report.md, self-contained report.html (no external deps), machine-readable metrics.json, ANSI terminal summary.
- Schema-stable JSON (see docs/schema/metrics.v1.json); snapshot-tested so downstream LLM agents don't break on patch versions.
- mcp-loadtest compare baseline.json current.json for regression diffs in CI.
- mcp-loadtest cross --server "..." --server "..." for side-by-side runs across N targets.
AI-agent friendly
- Stable JSON output and structured error messages with Hint: lines.
- mcp-loadtest serve --mcp exposes the tool itself as an MCP server so Claude Code, Cursor, or any MCP-aware agent can call deadlock_probe, sustained_load, and compare_runs as MCP tools directly. See DESIGN.md §21.2.

CI gating & protocol-aware assertions

mcp-loadtest is built to be a CI regression gate, not just a profiler. Every run resolves to a pass/fail and a non-zero exit code, so it drops straight into a pipeline:

Threshold gating — [thresholds] (p50/p95/p99/p999 latency, error rate, memory growth, per-tool SLOs). Any breach → report.passed() == false → non-zero exit. Deadlocks are zero-tolerance.
Baseline regression diff — mcp-loadtest compare baseline.json current.json flags p99 / error-rate / deadlock regressions. Thresholds default to 10% p99 / 0.5pp error rate / deadlock-zero-tolerance and are now configurable:
```
mcp-loadtest compare base.json cur.json \
    --max-p99-regression-pct 15 --max-error-rate-regression-pp 1.0
```
The same knobs are exposed as compare_runs MCP tool args for agent-driven gating (ADR 0009).
Protocol-aware assertions — opt-in strict mode validates every tools/call's arguments against the server's advertised inputSchema before the call. A contract mismatch is recorded as a ProtocolError and gates the run. Off by default (forward-compatible, ADR 0005/0010); enable per-config:
```
[validation]
strict = true
```

Full GitHub Actions example: docs/examples/ci-integration.md.

Quick start

# Install from the public repo (not on crates.io yet — see docs/adr/0015)
cargo install --git https://github.com/Teerapat-Vatpitak/mcp-loadtest mcp-loadtest-cli
# ...or download a prebuilt binary from the GitHub Release:
#   https://github.com/Teerapat-Vatpitak/mcp-loadtest/releases

# Quick deadlock smoke against a real MCP server
mcp-loadtest deadlock-probe --server "python -m my_mcp" --tool foo

# Sustained load from a config file
mcp-loadtest run --config bench.toml

# Print a starter config
mcp-loadtest example-config > bench.toml

# Compare two runs (e.g. main vs PR branch)
mcp-loadtest compare runs/baseline/metrics.json runs/current/metrics.json

A minimal bench.toml:

[server]
command = "python"
args = ["-m", "my_mcp"]
transport = "stdio"

[scenario]
type = "sustained"
duration = "60s"
concurrent = 50
tool = "get_market_data"
args = { ticker = "AAPL" }

[thresholds]
p99_latency = "500ms"
error_rate = 0.01
hang_timeout = "5s"

[output]
report_dir = "./runs"
formats = ["terminal", "markdown", "json"]  # "html" is also available

From the CLI (the common path):

cargo run -p mcp-loadtest-cli -- deadlock-probe \
    --server "python -m my_mcp" \
    --tool get_market_data \
    --concurrent 20 \
    --args '{"ticker":"AAPL"}'

Library usage (pseudocode — see crates/mcp-loadtest/tests/vibe_trading_regression.rs for a runnable example):

// Sketch of the library API. RunContext requires run_start, cancel_token,
// metrics, hang_threshold, and grace_period — see the regression test linked
// above for the wiring.
use std::time::Duration;
use mcp_loadtest::scenario::deadlock_probe::DeadlockProbe;
use mcp_loadtest::scenario::Scenario;
use mcp_loadtest::Session;
use serde_json::json;

#[tokio::test]
async fn no_deadlock_under_concurrent_calls() {
    let mut session = Session::spawn("python", ["-m", "my_mcp"]).await.unwrap();
    let probe = DeadlockProbe {
        concurrent: 20,
        hang_threshold: Duration::from_secs(2),
        grace_period: Duration::from_secs(5),
        tool: "get_market_data".into(),
        args: json!({ "ticker": "AAPL" }),
    };
    // Build RunContext { run_start, cancel_token, metrics, hang_threshold, grace_period }.
    let outcome = probe.drive(&mut session, &ctx).await;
    assert_eq!(outcome.deadlock_count, 0);
}

vs `reaatech/mcp-load-test`

reaatech/mcp-load-test is the only other MCP load tester we're aware of. It's a TypeScript monorepo and covers the load-testing basics well. mcp-loadtest is built on a different axis: Rust performance + a static binary, plus a bug-detector layer that targets the classes of MCP failures unit tests miss.

Feature	reaatech	mcp-loadtest
Deadlock detection (`deadlock_probe`)	not available	yes
Race / non-determinism detector	not available	yes (`race_check`)
Real-time TUI dashboard	post-hoc only	yes
Cross-server compare (1 run, N targets)	partial (2-run baseline diff)	yes (`cross` subcommand)
Server resource sampling over time (RSS/CPU/fd)	latency only	yes
Protocol fuzzer	not available	yes
Coverage tracking (registered vs exercised tools)	not available	yes
Per-tool SLO assertions	global only	yes
Configurable regression thresholds (CLI + MCP args)	fixed	yes
Protocol-aware assertions (opt-in strict `inputSchema` gating)	not available	yes
Self-hosted as MCP server (LLM-agent control)	not available	yes
HTML report	not available	yes
WebSocket transport	not available	yes
Rust perf + static binary	Node runtime required	single ~5 MB binary via `cargo install`
stdio transport	yes	yes
HTTP / SSE transports	yes	yes
Latency histograms p50/p95/p99/p999	yes	yes
Breaking-point detection	yes	yes
Performance grading A-F	yes	yes
Soak / leak detection	yes	yes
Spike scenario (sudden burst)	yes	yes
Compare baselines	yes	yes
Realistic patterns (explore-then-act, multi-step)	yes	yes
Console + markdown + JSON reporters	yes	yes
Programmatic library API	yes	yes

We tracked 4 direct competitors (reaatech, haakco/mcp-testing-framework, spbiju/MCP-Benchmark, IBM mcp-context-forge internal) and 6 adjacent LLM-eval frameworks (MCP-Bench, MCPBench, MCP-Universe, MCPMark, MCP-Inspector, k6-MCP). See DESIGN.md §10.5 for the full matrix and ADR 0004 for the positioning decision.

Built-in scenarios

Scenario	Detects
`cold_start`	startup time regressions, init-time deadlocks
`sustained`	baseline p99 latency, throughput, error rate
`ramp`	break-point — concurrency where p99 explodes
`spike`	sudden-burst load — baseline → peak window → cooldown
`soak`	memory leaks under sustained load
`deadlock_probe`	lazy-init deadlocks (the canonical Vibe-Trading bug class)
`race_check`	non-determinism / order-sensitive bugs
`pattern`	weighted random mixes (explore-then-act, read-then-write, multi-step)

Each scenario is one impl Scenario in crates/mcp-loadtest/src/scenario/ with a JSON-Schema describing its config block. See DESIGN.md §8 for the full table.

Cookbook

Three worked examples in docs/examples/:

CI integration — GitHub Actions workflow that runs mcp-loadtest on every PR and fails the build on threshold violations.
Custom scenario — write impl Scenario for MyThing, register it, drive it from a TOML config. Uses DeadlockProbe as a reference.
Debugging deadlocks — narrative walkthrough of what to do when deadlock-probe says DEADLOCK DETECTED. Stderr inspection, the lazy-init pattern that caused Vibe-Trading PR #85, and a worked-example test you can copy.

Install

# From the public repo (not on crates.io yet — see docs/adr/0015)
cargo install --git https://github.com/Teerapat-Vatpitak/mcp-loadtest mcp-loadtest-cli

Or download a prebuilt binary for Linux/macOS/Windows from the GitHub Release. The crates.io publish is deferred to keep the first release off append-only (ADR 0015).

Status

v0.1.0 is tagged (v0.1.0, annotated) and validated: the CI checks (fmt, clippy, build, test, doc) are green on Windows with 368 tests passing, plus cargo deny / cargo audit clean. The killer demo (deadlock_probe catches the Vibe-Trading PR #85 bug on the unpatched commit) is in crates/mcp-loadtest/tests/vibe_trading_regression.rs. The repo is public and cargo install --git works today; prebuilt GitHub Release binaries are the next step. crates.io is deferred (ADR 0015).

Development

git clone https://github.com/Teerapat-Vatpitak/mcp-loadtest
cd mcp-loadtest
bash scripts/ci-checks.sh        # or: pwsh scripts/ci-checks.ps1 on Windows
cargo nextest run --workspace --all-features

See CLAUDE.md for project conventions and CONTRIBUTING.md before opening a PR.

Documents

DESIGN.md — what this is + how it works (21 sections)
PROJECT-STRUCTURE.md — how the repo is laid out for AI-assisted development
CONTRIBUTING.md — how to propose changes and open a PR
CODE_OF_CONDUCT.md — contributor expectations
SECURITY.md — how to report vulnerabilities
docs/adr/ — architecture decision records
docs/examples/ — cookbook
CHANGELOG.md — release history

License

Dual-licensed under MIT OR Apache-2.0, at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.claude		.claude
.github		.github
crates		crates
docs		docs
scripts		scripts
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DESIGN.md		DESIGN.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
POST_PUBLISH_ISSUES.md		POST_PUBLISH_ISSUES.md
PROJECT-STRUCTURE.md		PROJECT-STRUCTURE.md
README.md		README.md
SECURITY.md		SECURITY.md
clippy.toml		clippy.toml
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-loadtest

Why mcp-loadtest

What it does

CI gating & protocol-aware assertions

Quick start

vs `reaatech/mcp-load-test`

Built-in scenarios

Cookbook

Install

Status

Development

Documents

License

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-loadtest

Why mcp-loadtest

What it does

CI gating & protocol-aware assertions

Quick start

vs reaatech/mcp-load-test

Built-in scenarios

Cookbook

Install

Status

Development

Documents

License

About

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

vs `reaatech/mcp-load-test`

Packages