Skip to content

Latest commit

 

History

History
195 lines (108 loc) · 9.37 KB

File metadata and controls

195 lines (108 loc) · 9.37 KB

Post-Publish Issue Plan

This file enumerates every item deferred to v0.2 (or later) during the v0.1.0 pre-public review. Open each as a GitHub issue immediately after the repo flips public so they aren't lost.

Each block below is in gh issue create shape — copy-paste-ready once the repo URL is live.


🔒 Security

chore(deps): revisit triaged RUSTSEC ignores (ADR 0011)

Body:

deny.toml ignores three advisories, each justified in ADR 0011. Re-evaluate each release:

  • RUSTSEC-2026-0002 (lru IterMut Stacked-Borrows unsoundness) — transitive via ratatui 0.29 (lru ^0.12), only under the optional tui feature. Drop the ignore + bump once ratatui ships on patched lru (≥0.13).
  • RUSTSEC-2025-0052 (async-std discontinued) — dev-only via httpmock. Drop when httpmock releases without async-std (or swap the mock).
  • RUSTSEC-2024-0436 (paste unmaintained) — transitive proc-macro. Drop when the dep tree moves off paste (e.g. pastey). Action: cargo deny check after cargo update; if any becomes an actual vulnerability or lru reaches the default build, fix immediately rather than re-ignoring.

Labels: security, dependencies, chore

feat(transport): host-allowlist for HTTP/SSE/WS to defend against SSRF

Body:

Currently the redirect policy is hardened to Policy::none() per ADR 0007, but an operator pointing the load tester at a malicious URL can still hit any internal endpoint resolvable from the machine running the test. v0.2 should add an operator-facing allowlist (TOML config: server.allowed_hosts = ["app.example.com"]) and reject connect_async/reqwest::send calls when the resolved host isn't matched. See CHANGELOG [Unreleased] Notes.

Labels: security, enhancement, v0.2


⚡ Performance

perf(scenarios): switch args: ValuetoArc for hot-loop sharing

Body:

Pre-publish Phase 1 audit found scenarios still deep-clone Value once per worker spawn (the &Value change in commit c8dee52 cut per-call clones, not per-worker setup). Wrapping Sustained.args / Pattern.args etc. in Arc<Value> lets concurrent workers share the JSON tree by reference. Touches Session::call_tool signature → breaking v0.2 API change.

Labels: performance, breaking, v0.2

perf(transport): drop double-parse in SSE/WS id-extract

Body:

extract_id parses the entire JSON twice (once for id-probe, once for full body after match). Use simd-json or a streaming id-extractor (parse only the first "id":N key). Matters at >100K iter/s — not a v0.1 blocker.

Labels: performance, v0.2

perf(session): drop Stringallocation instdio line trim

Body:

stdio.rs::request does self.line_buf.trim_end().to_string(). In-place truncate plus returning &str would save one alloc per call. Small win at high call rates.

Labels: performance, v0.2


🧪 Tests / coverage

test(scenario): land real cold_start handshake-latency test

Body:

ColdStart is intentionally a placeholder in v0.1.0; the integration test pins the inert-placeholder contract. v0.2: implement real cold-start sampling (Session::reinitialize loop + per-iteration histogram) and replace the placeholder assertion with a measured-latency assertion.

Labels: test, feature, v0.2

test(fixtures): add mock-leak.py, mock-error.py, mock-slow-init.py, mock-malformed.py``

Body:

DESIGN.md §16 lists 10 mock fixtures; v0.1 ships 6 (normal/slow/broken/crash/http/sse). The 4 missing fixtures gate richer scenario coverage:

  • mock-leak.py — RSS grows over time → exercises soak::detect_leak
  • mock-error.py — returns JSON-RPC errors deterministically → exercises error-classification scenarios
  • mock-slow-init.py — slow initialize handshake → exercises cold_start (post-real-impl)
  • mock-malformed.py — emits malformed JSON → exercises fuzzer's defensive parse paths Each is < 50 lines of stdlib-only Python following _common.py.

Labels: test, v0.2

test(bench): wire criterion benches into CI baseline comparison

Body:

Phase 1 added benches/{record,histogram,session_loopback,hang_detect}.rs. v0.2: capture baseline numbers in bench-baseline.json and add a cargo bench-check CI step that flags regressions > 10% vs baseline (same threshold as compare subcommand).

Labels: test, ci, v0.2


🛠️ Tech debt — file splits (M8)

Files where production code (excluding #[cfg(test)] mod tests) exceeds the 300-line convention. The bracketed numbers are production LoC / total LoC.

chore(refactor): split files > 300 production LoC (M8) COMPLETED in v0.1.0 pre-publish

All 11 originally-flagged files now have production code under 300 lines. Splits landed across four commits in the pre-publish review:

  • Wave 1: scenario/soak.rs (358→196 via soak/leak_detect.rs), report/html.rs (336→212 via html/{css,chart}.rs), scenario/spike.rs (331→217 via spike/phase.rs).
  • Wave 2: cmd_compare.rs (503→100 via cmd_compare/{types,diff,render}.rs), scenario/fuzzer.rs (488→277 via fuzzer/{payloads,classify}.rs), serve/tools.rs (509→130 via serve/tools/{deadlock_probe,sustained_load,compare_runs}.rs).
  • Wave 3: run.rs (447→293 via run/thresholds.rs), metrics/mod.rs (426→<300 via metrics/{types,per_tool}.rs + module-doc trim), config.rs (376→217 via config/{validate,example}.rs), main.rs (434→208 via cmd_run.rs + cmd_deadlock.rs + emit.rs).
  • Wave 4 (trim): metrics/mod.rs, scenario/soak.rs, serve/mod.rs — module-doc trim brought them under the 300 cap.
  • sse.rs (311→263 via sse/reader.rs), cmd_cross.rs (322→189 via cmd_cross/render.rs).

Public API paths preserved via pub use re-exports throughout. 264 tests pass, cargo fmt --check + cargo clippy -D warnings clean.

Labels: tech-debt — done.


🎁 Features

feat(transport): add Transport::raw_send(&[u8]) hook for fuzzer raw-byte payloads

Body:

Fuzzer currently skips raw-transport payloads (GiantPayload raw variant, etc.) because there's no API to bypass JSON-RPC framing. v0.2: add a raw_send method to the Transport trait that lets Fuzzer send arbitrary bytes. Enables full coverage of the malformed-input attack surface.

Labels: feature, v0.2

feat(cli): --capture-stderr flag for stdio transport

Body:

Currently the spawned MCP server's stderr inherits the parent's stderr. When mcp-loadtest runs as a child of an LLM agent, the target server's stderr blends into the agent's view. Add a flag to redirect to a per-run file (runs/<ulid>/server-stderr.log).

Labels: feature, v0.2

feat(cli): docker-compose generator for the cross subcommand

Body:

IBM's mcp-context-forge perf testing wanted Docker Compose multi-server setup. We have cross which drives N servers but doesn't scaffold the compose file. Add mcp-loadtest cross --emit-compose > docker-compose.yml.

Labels: feature, v0.2

feat(report): HTML report charts using inline JS for interactivity

Body:

Current HTML reporter uses inline SVG (static, no JS). v0.2 could optionally embed Chart.js + interactive percentile sliders. Stays self-contained (<script> block, no CDN). Trade: report file grows from ~20 KB to ~200 KB.

Labels: feature, enhancement, v0.2


📚 Docs

docs(adr): add ADRs 0010+ for v0.2 decisions as they land

Body:

Pre-publish review added ADRs 0005–0009. As v0.2 features land (host-allowlist, raw-byte transport, etc.) each architectural decision should get an ADR. Template: copy docs/adr/0001-language-rust.md.

Labels: docs, process

docs(readme): add real benchmark numbers from criterion runs

Body:

Once v0.2 wires bench baselines into CI (above), the README can quote real numbers: Recorder::record: 47 ns, Session::call_tool loopback: 8.2 µs, etc. Replaces the current "Rust performance" handwave.

Labels: docs, v0.2


🚢 Release process

release: confirm new GitHub repo URL is live before cargo publish``

Body:

The repository and homepage fields in Cargo.toml point at https://github.com/Teerapat-Vatpitak/mcp-loadtest. The previous repo at that URL was deleted during pre-publish review. Before running cargo publish, recreate the public repo at the same URL OR update both fields to the new URL.

Labels: release, process

release: cargo publish dry-run + smoke install

Body:

Pre-publish checklist:

  1. cargo publish --dry-run -p mcp-loadtest (lib first)
  2. cargo publish --dry-run -p mcp-loadtest-cli (then CLI)
  3. Both should be clean (no warnings about missing fields, no API surface issues)
  4. After actual publish: cargo install mcp-loadtest-cli on a fresh shell to verify the binary works end-to-end

Status (pre-publish, 2026-05-16): steps 1–3 done — mcp-loadtest dry-run packages clean; mcp-loadtest-cli dry-run fails with "no matching package mcp-loadtest" which is the expected workspace publish-ordering constraint (publish the lib first, then the CLI), not a defect. Local release-build smoke (--version / list-scenarios / a strict run) passed. Step 4 (cargo install from crates.io) remains for after the real publish.

Labels: release, process