Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,17 @@ jobs:
with:
tool: cargo-audit
- run: cargo audit

msrv:
name: MSRV (Rust 1.88)
runs-on: ubuntu-latest
# Verifies the declared `rust-version`. Lints are capped to `warn` here so
# this stays a pure compile/edition/dependency gate — the `checks` job
# enforces `-D warnings` on stable.
env:
RUSTFLAGS: --cap-lints=warn
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@1.88
- uses: Swatinem/rust-cache@v2
- run: cargo build --workspace --all-features --locked
96 changes: 49 additions & 47 deletions AGENTS.md

Large diffs are not rendered by default.

131 changes: 76 additions & 55 deletions CHANGELOG.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ license = "MIT OR Apache-2.0"
repository = "https://github.com/Teerapat-Vatpitak/mcp-loadtest"
homepage = "https://github.com/Teerapat-Vatpitak/mcp-loadtest"
authors = ["Teerapat Vatpitak"]
rust-version = "1.86"
rust-version = "1.88"

[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
Expand Down
368 changes: 204 additions & 164 deletions DESIGN.md

Large diffs are not rendered by default.

38 changes: 35 additions & 3 deletions POST_PUBLISH_ISSUES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,23 @@ Each block below is in `gh issue create` shape — copy-paste-ready once the rep

## 🔒 Security

### `chore(deps): revisit triaged RUSTSEC ignores (ADR 0011)`

**Body:**

> `deny.toml` ignores three advisories, each justified in ADR 0011. Re-evaluate each release:
>
> - **RUSTSEC-2026-0002** (`lru` `IterMut` Stacked-Borrows unsoundness) — transitive via `ratatui 0.29` (`lru ^0.12`), only under the optional `tui` feature. Drop the ignore + bump once `ratatui` ships on patched `lru` (≥0.13).
> - **RUSTSEC-2025-0052** (`async-std` discontinued) — dev-only via `httpmock`. Drop when `httpmock` releases without `async-std` (or swap the mock).
> - **RUSTSEC-2024-0436** (`paste` unmaintained) — transitive proc-macro. Drop when the dep tree moves off `paste` (e.g. `pastey`).
> Action: `cargo deny check` after `cargo update`; if any becomes an actual vulnerability or `lru` reaches the default build, fix immediately rather than re-ignoring.

**Labels:** `security`, `dependencies`, `chore`

### `feat(transport): host-allowlist for HTTP/SSE/WS to defend against SSRF`

**Body:**

> Currently the redirect policy is hardened to `Policy::none()` per [ADR 0007](docs/adr/0007-transport-security-posture.md), but an operator pointing the load tester at a malicious URL can still hit any internal endpoint resolvable from the machine running the test. v0.2 should add an operator-facing allowlist (TOML config: `server.allowed_hosts = ["app.example.com"]`) and reject `connect_async`/`reqwest::send` calls when the resolved host isn't matched. See CHANGELOG `[Unreleased]` Notes.

**Labels:** `security`, `enhancement`, `v0.2`
Expand All @@ -19,23 +33,26 @@ Each block below is in `gh issue create` shape — copy-paste-ready once the rep

## ⚡ Performance

### `perf(scenarios): switch `args: Value` to `Arc<Value>` for hot-loop sharing`
### `perf(scenarios): switch `args: Value`to`Arc<Value>` for hot-loop sharing`

**Body:**

> Pre-publish Phase 1 audit found scenarios still deep-clone `Value` once per worker spawn (the `&Value` change in commit `c8dee52` cut per-call clones, not per-worker setup). Wrapping `Sustained.args` / `Pattern.args` etc. in `Arc<Value>` lets concurrent workers share the JSON tree by reference. Touches `Session::call_tool` signature → breaking v0.2 API change.

**Labels:** `performance`, `breaking`, `v0.2`

### `perf(transport): drop double-parse in SSE/WS id-extract`

**Body:**

> `extract_id` parses the entire JSON twice (once for id-probe, once for full body after match). Use `simd-json` or a streaming id-extractor (parse only the first `"id":N` key). Matters at >100K iter/s — not a v0.1 blocker.

**Labels:** `performance`, `v0.2`

### `perf(session): drop `String` allocation in `stdio` line trim`
### `perf(session): drop `String`allocation in`stdio` line trim`

**Body:**

> `stdio.rs::request` does `self.line_buf.trim_end().to_string()`. In-place truncate plus returning `&str` would save one alloc per call. Small win at high call rates.

**Labels:** `performance`, `v0.2`
Expand All @@ -47,25 +64,29 @@ Each block below is in `gh issue create` shape — copy-paste-ready once the rep
### `test(scenario): land real cold_start handshake-latency test`

**Body:**

> `ColdStart` is intentionally a placeholder in v0.1.0; the integration test pins the inert-placeholder contract. v0.2: implement real cold-start sampling (`Session::reinitialize` loop + per-iteration histogram) and replace the placeholder assertion with a measured-latency assertion.

**Labels:** `test`, `feature`, `v0.2`

### `test(fixtures): add `mock-leak.py`, `mock-error.py`, `mock-slow-init.py`, `mock-malformed.py``

**Body:**

> DESIGN.md §16 lists 10 mock fixtures; v0.1 ships 6 (normal/slow/broken/crash/http/sse). The 4 missing fixtures gate richer scenario coverage:
>
> - `mock-leak.py` — RSS grows over time → exercises `soak::detect_leak`
> - `mock-error.py` — returns JSON-RPC errors deterministically → exercises error-classification scenarios
> - `mock-slow-init.py` — slow initialize handshake → exercises `cold_start` (post-real-impl)
> - `mock-malformed.py` — emits malformed JSON → exercises fuzzer's defensive parse paths
> Each is < 50 lines of stdlib-only Python following `_common.py`.
> Each is < 50 lines of stdlib-only Python following `_common.py`.

**Labels:** `test`, `v0.2`

### `test(bench): wire criterion benches into CI baseline comparison`

**Body:**

> Phase 1 added `benches/{record,histogram,session_loopback,hang_detect}.rs`. v0.2: capture baseline numbers in `bench-baseline.json` and add a `cargo bench-check` CI step that flags regressions > 10% vs baseline (same threshold as `compare` subcommand).

**Labels:** `test`, `ci`, `v0.2`
Expand Down Expand Up @@ -97,27 +118,31 @@ Public API paths preserved via `pub use` re-exports throughout. 264 tests pass,
### `feat(transport): add `Transport::raw_send(&[u8])` hook for fuzzer raw-byte payloads`

**Body:**

> `Fuzzer` currently skips raw-transport payloads (`GiantPayload` raw variant, etc.) because there's no API to bypass JSON-RPC framing. v0.2: add a `raw_send` method to the `Transport` trait that lets `Fuzzer` send arbitrary bytes. Enables full coverage of the malformed-input attack surface.

**Labels:** `feature`, `v0.2`

### `feat(cli): `--capture-stderr` flag for stdio transport`

**Body:**

> Currently the spawned MCP server's stderr inherits the parent's stderr. When `mcp-loadtest` runs as a child of an LLM agent, the target server's stderr blends into the agent's view. Add a flag to redirect to a per-run file (`runs/<ulid>/server-stderr.log`).

**Labels:** `feature`, `v0.2`

### `feat(cli): docker-compose generator for the `cross` subcommand`

**Body:**

> IBM's mcp-context-forge perf testing wanted Docker Compose multi-server setup. We have `cross` which drives N servers but doesn't scaffold the compose file. Add `mcp-loadtest cross --emit-compose > docker-compose.yml`.

**Labels:** `feature`, `v0.2`

### `feat(report): HTML report charts using inline JS for interactivity`

**Body:**

> Current HTML reporter uses inline SVG (static, no JS). v0.2 could optionally embed Chart.js + interactive percentile sliders. Stays self-contained (`<script>` block, no CDN). Trade: report file grows from ~20 KB to ~200 KB.

**Labels:** `feature`, `enhancement`, `v0.2`
Expand All @@ -129,13 +154,15 @@ Public API paths preserved via `pub use` re-exports throughout. 264 tests pass,
### `docs(adr): add ADRs 0010+ for v0.2 decisions as they land`

**Body:**

> Pre-publish review added ADRs 0005–0009. As v0.2 features land (host-allowlist, raw-byte transport, etc.) each architectural decision should get an ADR. Template: copy `docs/adr/0001-language-rust.md`.

**Labels:** `docs`, `process`

### `docs(readme): add real benchmark numbers from criterion runs`

**Body:**

> Once v0.2 wires bench baselines into CI (above), the README can quote real numbers: `Recorder::record: 47 ns`, `Session::call_tool loopback: 8.2 µs`, etc. Replaces the current "Rust performance" handwave.

**Labels:** `docs`, `v0.2`
Expand All @@ -147,17 +174,22 @@ Public API paths preserved via `pub use` re-exports throughout. 264 tests pass,
### `release: confirm new GitHub repo URL is live before `cargo publish``

**Body:**

> The `repository` and `homepage` fields in `Cargo.toml` point at `https://github.com/Teerapat-Vatpitak/mcp-loadtest`. The previous repo at that URL was deleted during pre-publish review. Before running `cargo publish`, recreate the public repo at the same URL OR update both fields to the new URL.

**Labels:** `release`, `process`

### `release: cargo publish dry-run + smoke install`

**Body:**

> Pre-publish checklist:
>
> 1. `cargo publish --dry-run -p mcp-loadtest` (lib first)
> 2. `cargo publish --dry-run -p mcp-loadtest-cli` (then CLI)
> 3. Both should be clean (no warnings about missing fields, no API surface issues)
> 4. After actual publish: `cargo install mcp-loadtest-cli` on a fresh shell to verify the binary works end-to-end

**Status (pre-publish, 2026-05-16):** steps 1–3 done — `mcp-loadtest` dry-run packages clean; `mcp-loadtest-cli` dry-run fails with "no matching package `mcp-loadtest`" which is the expected workspace publish-ordering constraint (publish the lib first, then the CLI), not a defect. Local release-build smoke (`--version` / `list-scenarios` / a strict `run`) passed. Step 4 (`cargo install` from crates.io) remains for after the real publish.

**Labels:** `release`, `process`
Loading
Loading