All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Distribution: v0.1.0 ships via
cargo install --git+ prebuilt GitHub Release binaries instead of crates.io, to keep the first public release off crates.io's append-only commitment (ADR 0015 — amends the distribution channel of ADR 0004; the head-on-competition strategy is unchanged). No code change; thev0.1.0tag is unaffected. README / DESIGN.md §10 / docs/RELEASE.md install instructions updated accordingly.
0.1.0 — 2026-05-17
First public release. See DESIGN.md §10 for milestone plan.
- Opt-in strict MCP schema validation (ADR 0010):
[validation] strict = truevalidates eachtools/call's arguments against the server's advertisedinputSchemabefore sending. A dependency-free subset validator (protocol::schema) coverstype/properties/required/enum/itemsand skips unmodeled keywords (forward-compatible, ADR 0005). Arg violations are classified asCallOutcome::ProtocolErrorso they gate a run; default (off) behaviour is byte-for-byte unchanged. Result-side validation is deferred to v0.2. - Configurable regression thresholds (ADR 0009 follow-up):
comparegains--max-p99-regression-pct/--max-error-rate-regression-pp/--allow-deadlock-increase, and thecompare_runsMCP tool gains the matching optional args. Backed by a sharedanalysis::regression::RegressionThresholdswhoseDefaultreproduces the historical 10% p99 / 0.5pp / deadlock-zero-tolerance policy, so existing CI gates are unaffected unless they opt in. - Project scaffolding: workspace layout, Cargo config, CLAUDE.md hierarchy, slash commands, CI workflow.
- Design document covering motivation, types, algorithms, test matrix, milestones (DESIGN.md, 20 sections).
- Project-structure document for AI-assisted development workflow (PROJECT-STRUCTURE.md).
- M1 protocol stack:
protocol::jsonrpc— JSON-RPC 2.0 message types (OutgoingRequest,OutgoingNotification,ResponseEnvelope,ResponsePayload,ErrorObject).protocol::mcp— MCP types (InitializeParams/Result,Tool,ListToolsResult,CallToolParams,CallToolResult,Content).
- M1
Session— stdio MCP session that spawns a child, runsinitialize+notifications/initialized, exposeslist_tools/call_tool/shutdown. Synchronous request/response only; concurrent in-flight requests deferred to M2. - Python test fixtures:
_common.py(framing helpers) +mock-normal.py(echoes args). - End-to-end integration test
happy_path.rscovering spawn → initialize → tools/list → tools/call → shutdown againstmock-normal.py. - M2 scenarios + metrics core (delivered via 4-agent parallel sprint):
scenario::Scenariotrait +RunContext+ScenarioOutcome(interface contract pinned in AGENTS.md).scenario::sustained::Sustained— concurrent-load workload (M2 sequential against single Session; multi-Session pool is M3).scenario::deadlock_probe::DeadlockProbe— Vibe-Trading-bug-class detector. Wraps eachtools/callwithhang_detect; bails on first deadlock to avoid flooding a wedged session.scenario::cold_start::ColdStart— placeholder (M3 will activate onceRunContextgains a session-spawning factory).hang_detector::hang_detect— two-phasetokio::select!watchdog (DESIGN.md §15.1) classifying each call as Ok / Slow / Deadlock / Err.metrics::Recorder— Arc-shared, lock-free outcome counters + 16-shardhdrhistogramfor per-call latencies (microsecond resolution, 1µs..=1h range).ScenarioMetrics,LatencyStats(p50/p95/p99/p999/mean/min/max/count),ThroughputStats,OutcomeCounts.
- M2 fixtures:
mock-broken.py(canonical deadlock pattern),mock-slow.py(2s tool latency),mock-crash.py(~1% mid-call exit). - M2 integration tests:
scenarios_basic.rs(sustained + cold_start placeholder + cancellation),deadlock.rs(mock_normal_no_deadlock+mock_broken_detects_deadlock— the killer test that catches the Vibe-Trading bug class in <7s). - M3 reports + first internal release (delivered via 4-agent parallel sprint):
config::Config+ServerConfig+ScenarioConfig+ThresholdsConfig+OutputConfig— TOML schema with humantime durations, semantic validation (Config::from_toml_str/from_file), andexample_config()printer.report::Report+ProcessStats+ProcessSample+ServerInfo+ThresholdViolation+ReportError+Reportertrait.report::markdown::MarkdownReporter— DESIGN §17.3 template (status badge, summary, latency table, errors, process line, threshold violations, trace).report::json::JsonReporter— DESIGN §17.2 schema via aReportViewwrapper (durations as_ms, ISO 8601 timestamps);Reportstays unmodified.report::terminal::TerminalReporter— ANSI-colored compact summary; respectsNO_COLOR/CLICOLOR/ non-tty automatically.metrics::process::ProcessSampler— sysinfo 0.32 backed periodic RSS+CPU sampler, two-phase CPU baseline, cancellation-aware, best-effort on dead PIDs.run::Run+RunError+Run::execute()— full orchestrator: ulid run-id, run dir creation,Sessionspawn, scenario drive, metrics snapshot, threshold evaluation, bounded shutdown.- 79 tests passing across lib + 6 integration test files.
- M3 CLI:
mcp-loadtest example-config | run --config <path> | deadlock-probe --server "..." | list-scenarios. Run + DeadlockProbe writereport.mdandmetrics.jsonunderruns/<ulid>/;deadlock-probeexits non-zero whendeadlock_count > 0. - M3 Vibe-Trading regression test: clones
HKUDS/Vibe-Trading@71220c7(parent of PR #85) intotarget/vibe-trading-fixture/, runsDeadlockProbe, asserts deadlock detected.#[ignore]d by default; run withcargo test --test vibe_trading_regression -- --ignored --nocapture. - M4 transport parity (3-agent parallel sprint):
protocol::transport::Transportasync trait +TransportError(Io/Http/Closed/Timeout/Other).StdioTransportextracted fromSession(legacy single-call path stays viaSession::spawn).HttpTransport(Streamable HTTP simple JSON variant; SSE-response detection routes to a clear M5 deferral).SseTransportwith background reader task + endpoint handshake + id-correlation buffer.- Python fixtures
mock-http-server.py+mock-sse-server.py(stdlib http.server only). ServerConfig.url+ transport-awareRun::executedispatch.- Toolchain bumped 1.85 → stable to satisfy icu transitive MSRV pulled in by
url.
- M5 analysis parity (4-agent parallel sprint):
analysis::breaking_point(BreakingPointDetector w/ first-violator semantics on per-step deltas).analysis::grading(Grade A-F per latency/concurrency/error, worst-of-three rollup).scenario::ramp(linear-stepped concurrency; integrates with breaking_point).scenario::pattern(multi-step + weighted-random + think-time + ErrorBehavior).Sustainedrefactor to drive Patterns;run_patternsfree function for multi-pattern callers.scenario::soak(periodic snapshots + leak signal via mean-latency regression).mcp-loadtest compare baseline.json current.jsonCLI subcommand (markdown/JSON diff).
- M6 differentiators v1 (4-agent parallel sprint):
tui::dashboard(ratatui + crossterm live polling; quits on q/Esc, propagates cancel).analysis::race_detector+scenario::race_check(key-sorted JSON canonicalization, divergence reporting).mcp-loadtest cross --server "..." --server "..."(side-by-side multi-server comparison with grading).ProcessStatsenriched withpeak_fd/final_fd/peak_threads/final_threads(best-effort: Linux /proc/fd; macOS/Windows degrade to 0).Soaklinear-regression-based leak signal helper (detect_leak).
- M7 differentiators v2 + v0.1.0 polish (4-agent parallel sprint):
scenario::fuzzer(FuzzPayload enum: UnknownMethod / NumericMethod / GiantPayload / ControlChars / Nested / NullParams / StringParams; raw-byte variants documented + skipped pending Transport::raw_send hook).analysis::fuzz_report(FuzzClass classification + has_critical signal).analysis::coverage(CoverageReport: registered vs exercised tools +coverage_pct).ToolSloper-tool latency budget assertions inThresholdsConfig.Recorder::record_tool+snapshot_per_tool(per-tool counters; existingrecord/snapshotaggregate untouched for back-compat).mcp-loadtest serve --mcpself-hosted MCP server (DESIGN §21.2 differentiator) — exposesdeadlock_probe/sustained_load/compare_runsas MCP tools so AI agents drive load tests directly via stdio JSON-RPC.- README rewritten to lead with the deadlock demo + competitive positioning vs. reaatech.
docs/examples/{ci-integration,custom-scenario,debugging-deadlocks}.mdcookbook.
- Post-M7 competitive-gap close (3-agent parallel sprint, post-review):
scenario::spike— sudden-burst concurrency pattern (baseline → spike window → cooldown). Closes the reaatech parity gap.report::html::HtmlReporter— self-containedreport.htmlwith inline SVG histogram, escaped HTML, no external CDN or JS. Closes the IBM/spbiju enterprise-report gap. Wired into the CLI as the"html"output format.protocol::transport::ws::WsTransport— WebSocket transport via tokio-tungstenite (rustls + webpki-roots); 16 MB per-frame OOM cap mirroring stdio. Activates the"ws"scheme that was previously parser-accepted but rejected at runtime.- SECURITY.md — security disclosure policy at repo root (in-scope vs out-of-scope surfaces, reporting flow, recent hardening notes).
- 12 new tests bring the suite to ~255 passing (was 243): spike happy path + 5 HTML reporter (escaping, chart, violation styling) + 2 WS (echo roundtrip + scheme rejection) + 1 config parse-spike-kind + scenario name/schema asserts.
- v0.1.0 pre-publish feature wave (4-agent parallel sprint, disjoint file ownership):
- SSRF host-allowlist (ADR 0012):
[server].allowed_hostsexact-match (ASCII case-insensitive, no wildcard) allowlist forhttp/sse/ws; empty/unset = allow any public host. Always-on block of private/loopback/link-local/ULA/unspecified/reserved IP literals (the SSE server-provided endpoint URL is checked too), with an operator escape hatch (list the literal, e.g."127.0.0.1", to permit local testing).protocol::transport::HostGuardis now public (needed to constructHttp/Sse/Wstransports directly via theirconnect(url, &guard)constructors). --capture-stderr/--tee-stderronrun(ADR 0013): redirect the spawned stdio server's stderr toruns/<id>/server-stderr.log(capture) or additionally mirror it live to the parent's stderr (tee). New public APISpawnOptions/StderrMode/StderrCapture,Session::spawn_with,StdioTransport::spawn_with,Run::with_stderr_capture; the stable 2-argSession::spawnis unchanged (delegates). Tee is a cancellation-aware, JoinHandle-tracked task that flushes before every exit.mcp-loadtest doctor(DESIGN §21.6, ADR 0014): 4 best-effort checks — Python on PATH, optional--serverinitialize smoke (captures the server's stderr on failure), staleruns/accumulation, Windows MSVC/GNU toolchain mismatch. ✅/❌ checklist + one-line fix per ❌; non-zero exit if any ❌.--explainglobal flag (DESIGN §21.4, ADR 0014): static per-subcommand algorithm text, exit 0. Serviced by a pre-clapstd::env::args()scan so it works without a subcommand's required args (e.g.run --explainneeds no--config).- Actionable error hints (DESIGN §21.3, ADR 0014):
hints::ErrorHint(CLI crate) mapsRunError/SessionError/TransportError/ConfigError/ReportErrorto a one-line next step, printed after the error chain at the CLI boundary so the library error enums stay clean. - Python fixtures
mock-leak.py(10 KB/call RSS growth),mock-error.py(cycles-32601/-32602/-32603),mock-slow-init.py(5 sinitializedelay),mock-malformed.py(every 10th response is newline-terminated broken JSON) — DESIGN §16.7-16.10, no longer "planned for v0.2".
- SSRF host-allowlist (ADR 0012):
report::commonextracted (post-M4/simplifypass):fmt_duration,fmt_count,format_server_command,describe_failure,format_iso8601_utcshared between markdown + terminal reporters (-69 LoC net).ThresholdViolation.metric: String→kind: ThresholdKindenum (post-M3 QF-5); JSON wire format preserved via#[serde(rename = "metric")]+ per-variant snake_case rename.Report::passed()now treatsdeadlock_count > 0as a hard failure (post-M3 QF-1).Session::pid()accessor +Run::executewiresProcessSampler(post-M3 QF-2).- CLI
deadlock-probeexits non-zero on any error / threshold violation / deadlock (post-M3 QF-3). - AGENTS.md — multi-agent coordination contract (file ownership, locked interfaces, sprint exit criteria).
- DESIGN.md §10 revised: 8-week head-on competition plan vs. reaatech/mcp-load-test (was 3-week solo plan).
- DESIGN.md §10.5 — competitive parity matrix (12 reaatech features to match, 12 mcp-loadtest differentiators).
- DESIGN.md §21 — AI-friendliness pillar (10 design principles incl. self-hosted MCP wrapper,
--explainflag,doctorsubcommand, structured errors with hints, JSON Schema for outputs). - ADR 0004 — strategic decision to compete head-on (Path A) over contributing to reaatech (Path B) or repositioning (Path C).
Soak::leak_threshold_mb_per_secrenamed tolatency_drift_ms_per_sec(the units were always ms/sec; the old name lied).- Fuzzer: skipped raw-transport iterations no longer bump
total_callsor polluteCallOutcome::Cancelled(they never hit the wire). - Fuzzer: server-accepted malformed payloads now record
CallOutcome::Malformedand bumperror_countso threshold evaluators surface them. - Run:
memory_growth_mbnow comparespeak − finalinstead of bare peak, so a steady-state high-RSS process no longer false-positives. - SSRF guard IPv6 hole found + fixed in review:
url::Url::host_str()returns bracketed IPv6 ([::1]), so the initialparse::<IpAddr>()host extraction silently let every IPv6 literal bypass the private-IP block. The guard now classifies the host via the typedurl::Hostenum (the parser already split host kind at parse time), closing the bypass; covered by IPv6/IPv4-mapped unit tests.
- SSRF hardening (ADR 0012, supersedes the ADR 0007 deferral): every outbound
http/sse/wsconnection — including the SSE server-provided endpoint URL — is validated against an optional exact-match host allowlist plus an always-on private/loopback/link-local/ULA/unspecified/reserved IP-literal block, before any network I/O. Complements the existingPolicy::none()redirect policy. Residual documented gap: a hostname that resolves to a private IP (DNS rebinding) is not blocked in v0.1 — the allowlist is the strong control; a resolver-pinning connector is the recorded v0.2 follow-up (ADR 0012 Open).
- Sustained / ramp / soak scenarios:
tokio::task::yield_now()instead ofsleep(ZERO)to avoid registering no-op reactor timers. - Metrics: per-tool
BTreeMapmoved behindRwLock; fast path is now a read lock. cmd_cross: cross-server runs usefutures::future::join_all(was sequential, now N-way parallel).- Fuzzer:
LazyLockforGIANT_PAYLOAD+NESTED_PAYLOADso multi-MB / 100-deep payloads build once per process. - Release profile:
panic = "abort"shaves ~600 KB off the stripped binary (5.7 MB → 5.1 MB on x86_64-pc-windows-gnu). - Hot-path zero-copy refactor (Phase 1 pre-public audit):
OutgoingRequest/OutgoingNotificationnow borrowmethod: &'a strandparams: &'a P(generic,P: ?Sized + Serialize). Eliminates the intermediateserde_json::to_value()round-trip on every tool call.CallToolParams { name: &'a str, arguments: &'a Value }.Session::call_tool(&str, &Value)— was(&str, Value). Scenarios drop.clone()and pass&self.args; deep-clone of the JSON args tree per iteration is gone.
- Transports:
pending: VecDeque<String>(id-mismatch buffer) is now capped atMAX_PENDING_FRAMES = 256in bothsseandws. Overflow surfacesTransportError::Otherinstead of growing without bound.
- Criterion microbenchmarks added under
crates/mcp-loadtest/benches/:record,histogram,session_loopback(Transport-trait loopback, no I/O),hang_detect. Run withcargo bench -p mcp-loadtest. Closes the DESIGN.md §19 perf-claim gap. - Phase 3 coverage gaps closed (pre-publish audit):
tests/ramp.rs— first integration test for therampscenario (was unit-only).tests/spike.rs— addedspike_against_crashing_server_survives_without_hang(failure-mode coverage; usesmock-crash.py; assertions are crash-stochastic-aware so the test isn't flaky).src/protocol/transport/ws.rs— 3 new failure-mode tests: server-closes-mid-call, cancel-during-request, oversized-frame-rejected.tests/reporter_snapshots.rs— 5 new tests: substring landmarks for html + terminal (insta-snapshot parity with markdown + json was skipped because both reporters have too much structural variance for stable snapshots); empty-metrics renders without panic for html, terminal, and json (catches divide-by-zero in throughput math).
- Test suite: 260 passing (was 250 pre-Phase-3); 0 flakes in a 3-run check; 1
#[ignore]left (Vibe-Trading regression — requires external checkout). cold_startscenario remains an intentional placeholder in v0.1.0; real handshake-time histogram measurement is queued for v0.2 (DESIGN §8). The existingcold_start_is_an_inert_placeholdertest pins the placeholder contract so the v0.2 work is forced to update assertions.
ServerConfig::stdio(command, args)constructor +split_server_command()free fn relocated toconfigmodule (kills 4 + 3 hand-rolled literals across the codebase).classify_error/is_terminal_errordeduped intoscenario/mod.rs(kills 3 byte-identical copies across pattern / ramp / soak).- Regression thresholds
P99_REGRESSION_PCT+ERROR_RATE_REGRESSION_PPlifted intoanalysis::regressionand shared betweencmd_compareandserve/tools::compare_runs. - Unified scenario builder: a single
build_scenariofactory now drives every config —sustainedaccepts weightedpatterns/ legacytool_callarrays (viaPatternScenario) alongside the single-toolpath, and all M5–M7 kinds (ramp/soak/spike/race_check/fuzzer/pattern) dispatch through it. cmd_runsplit into privatebuilder/params/patternssubmodules (each under the 300 prod-LoC convention); public surface unchanged (run_from_config+ the re-exportedparse_dur_strmain.rsshares withdeadlock-probe/cross).- Breaking (absorbed into v0.1.0, pre-publish):
HttpTransport/SseTransport/WsTransport::connecttake an added&HostGuardargument;StdioTransport::spawnis nowasync; CLIcmd_run::run_from_configtakes addedcapture_stderr/tee_stderrparameters. The documentedSession::spawn(command, args)signature is unchanged (delegates tospawn_with). - DESIGN.md §16.7-16.10: dropped the (planned for v0.2) markers — the four fixtures now ship.
DEFAULT_LEAK_THRESHOLD_MB_PER_SEC→ useDEFAULT_LATENCY_DRIFT_MS_PER_SEC. The old constant remains as an alias for one release and will be removed in v0.2.0.
- ✅ The M8 file-split pass completed in the pre-publish review. All source files have production code (excluding
#[cfg(test)] mod tests) under the 300-line convention. SeePOST_PUBLISH_ISSUES.mdfor the per-wave summary of what split where. serveandtuimodules will move behind cargo feature flags in a future release to keep the default build slim.- ✅ HTTP / SSE / WS transport host-allowlist for SSRF defense landed (was deferred): exact-match
[server].allowed_hosts+ always-on private/loopback/link-local/ULA/reserved IP-literal block, on top of the existingPolicy::none()redirect policy. Supersedes the ADR 0007 deferral — see ADR 0012. Residual documented gap: hostname→private-IP (DNS rebinding) is not yet blocked in v0.1; resolver-pinning is the recorded v0.2 follow-up. - ✅ Pre-publish security pass on the B1/B2 surface: bounded
protocol::schemarecursion (MAX_SCHEMA_DEPTH, defends strict mode against a maliciously deep serverinputSchema); non-positive regression thresholds rejected at the CLI/MCP boundary (would otherwise invert the gate);compare_runsno longer echoes the raw caller path;enum-violation messages are length-capped. - ✅ Pre-publish stabilization:
crates/mcp-loadtestnowexcludes the fourCLAUDE.mdscaffolding files from the published package (108 → 104 files); addedrun_strictCLI integration test covering the realrun --configentrypoint with[validation] strict = trueend-to-end (TOML →Run::execute→ report-on-disk → non-zero gate); full pipeline (fmt, clippy-D warnings,--lockedbuild/test, doc) verified on the x86_64-pc-windows-msvc target — the toolchain crates.io ships to Windows users. - ✅ Supply-chain gate (
cargo deny check/cargo audit): allowlistedCDLA-Permissive-2.0(thewebpki-rootsCA-bundle license, viatokio-tungstenite) and added three individually-triaged, documented advisoryignores — RUSTSEC-2025-0052 (async-std, dev-only viahttpmock), RUSTSEC-2024-0436 (paste, transitive), RUSTSEC-2026-0002 (lruunsoundness, transitive viaratatui, optionaltuifeature only, no semver fix). Rationale + revisit conditions in ADR 0011 +POST_PUBLISH_ISSUES.md; no actual vulnerabilities, gate still hard-fails on anything new. - ✅ Pre-publish API-durability & standards pass (so v0.1.0's public surface is safe to commit to):
#[non_exhaustive]on theConfigfamily (Config/ServerConfig/ScenarioConfig/ThresholdsConfig/OutputConfig/ValidationConfig/ConfigError), the public error/outcome enums (SessionError,RunError,CallOutcome,ThresholdKind,ReportError,TransportError,HangOutcome,SchemaPolicy,ValidationSite) andRunContext— adding a field/variant is no longer a breaking change. New constructors keep ergonomic cross-crate construction:Config::new+with_thresholds/with_output/with_validation,ScenarioConfig::new,OutputConfig::new,RunContext::new.- Crate-root docs rewritten: dropped the stale "v0.0.0 — scaffolding" status and replaced the fictional-API example with a real, compiling
no_runone;ValidationConfigadded to the public re-export facade. - docs.rs builds
--all-features(soserve/tuirender);missing_docsraised todeny(lib + CLI); README repo links rewritten to absolute URLs so they resolve on crates.io. - MSRV corrected
1.86→1.88(the real floor — edition-2024 let-chains; verified by an actualcargo buildon 1.88) with a new CImsrvjob pinning it; modernized the clippy nits it surfaced (collapsible_if,is_multiple_of). - Re-verified end-to-end on the x86_64-pc-windows-msvc target: fmt, clippy
-D warnings,--lockedbuild/test (incl. doctests),doc -D warnings,cargo deny,cargo audit,publish --dry-run(104 files).