v0.1.0 readiness: builder refactor + B1/B2 + full pre-publish hardening by Teerapat-Vatpitak · Pull Request #1 · Teerapat-Vatpitak/mcp-loadtest

Teerapat-Vatpitak · 2026-05-17T05:17:53Z

What

v0.1.0 readiness: completes the in-flight scenario-builder refactor, closes the two differentiator gaps, then a full pre-publish security + stabilization + API-durability hardening pass. 8 commits (b6d7165…b7e75a2).

Commit
`b6d7165` refactor(cli)	finish unified scenario builder; split `cmd_run` into `builder`/`params`/`patterns` (<300 prod LoC each)
`ba0faae` feat(compare)	configurable regression thresholds — `compare` flags + `compare_runs` MCP args (ADR 0009)
`10443aa` feat(validation)	opt-in strict MCP schema validation, dependency-free subset validator (ADR 0010)
`c9089a5` docs	advertise CI gating + protocol-aware assertions (README/DESIGN)
`ad2e5c6` fix(security)	bound schema recursion, reject non-positive thresholds, path/enum message hardening
`95783e6` chore(release)	exclude CLAUDE.md from package; `run_strict` e2e test (real `run --config` path)
`fec989e` chore(deny)	allowlist `CDLA-Permissive-2.0`; triaged RUSTSEC ignores (ADR 0011)
`b7e75a2` refactor(api)	`#[non_exhaustive]` + constructors on the public surface; crate-root docs; docs.rs all-features; `missing_docs=deny`; MSRV 1.86→1.88 (verified)

Net public-API change is pre-release hardening only; Cargo.lock is identical to main (zero dependency changes across all 8 commits).

Why a PR (not a merge)

This PR exists to run the authoritative CI backstop — 3-OS (ubuntu/macOS/windows-latest) checks + cargo-audit + the new msrv (Rust 1.88) job — including the toolchains that couldn't be exercised locally. Not publishing; no tag; no crates.io.

Local verification (x86_64-pc-windows-msvc)

fmt · clippy -D warnings · build --locked · full test incl. doctests · doc -D warnings · cargo deny check · cargo audit (rc 0) · publish --dry-run -p mcp-loadtest (104 files) — all green. See CHANGELOG.md [Unreleased] for the full detail and ADRs 0009–0011 for the decisions.

Out of scope (v0.2)

Result-side schema validation, host-allowlist SSRF, distributed mode, revisiting the triaged RUSTSEC ignores — tracked in POST_PUBLISH_ISSUES.md.

cmd_run.rs grew past the 300 prod-LoC convention while the in-flight refactor unified single-tool, weighted-pattern and legacy tool_call configs through one build_scenario factory and added M5-M7 kind dispatch. Split into private builder/params/patterns submodules. The public surface (run_from_config + the re-exported parse_dur_str shared with deadlock-probe and cross) is unchanged and now guarded by a cmd_run integration test. CHANGELOG and the custom-scenario doc are synced to the new layout. ci-checks green: fmt, clippy -D warnings, build --locked, 260+ tests, doc -D warnings.

ADR 0009 deferred operator-configurable thresholds until a user asked; this lands them. p99 / error-rate / deadlock-zero-tolerance were hard-coded constants shared by the compare CLI and the compare_runs MCP tool, so a user wanting a different CI budget had to fork. Add analysis::regression::RegressionThresholds whose Default reproduces the historical 10% / 0.5pp / zero-tolerance policy exactly, so existing gates are unaffected unless they opt in. Surface overrides where operators work: compare gains --max-p99-regression-pct / --max-error-rate-regression-pp / --allow-deadlock-increase; the compare_runs tool gains the matching optional args. CLI and MCP stay in lockstep by construction (same struct). ADR 0009 updated (open question resolved); CHANGELOG noted. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.

The protocol-aware-assertions differentiator wanted the tester to catch when a tools/call's args don't match the server's advertised inputSchema, but ADR 0005 deliberately chose lenient/forward-compat parsing and the repo's supply-chain discipline rules out an unauditable JSON Schema crate. Add `[validation] strict = true` (default false → behaviour unchanged): - protocol::schema, a dependency-free subset validator (type/properties/required/enum/items) that skips unmodeled keywords, so it never rejects on forward-compat grounds. - classify_schema_violation: args violations Fail (gate as ProtocolError), result violations Warn (result-side deferred to v0.2), none Ignore. - Session caches the tools/list registry the run already fetches (no extra round-trip, ADR 0006 hot path untouched when off) and rejects bad args before send via SessionError::SchemaViolation, mapped to CallOutcome::ProtocolError. ADR 0010 added; CHANGELOG noted. New mock-schema.py fixture + strict_validation.rs prove bad args gate and compliant args don't. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.

B1/B2 shipped real differentiators (configurable regression thresholds, opt-in protocol-aware schema validation) but the README and DESIGN matrix didn't surface them, so a reader couldn't tell the tool is a CI regression gate, not just a profiler. Add a "CI gating & protocol-aware assertions" section to the README (threshold gating, configurable compare, strict mode, link to the ci-integration cookbook) and two rows each to the README and DESIGN.md §10.5 competitive matrices. Docs only — no code change; doc -D warnings green.

Pre-publish security review of the configurable-thresholds and strict-validation work surfaced one HIGH and three lower findings, all in code added this branch: - HIGH: protocol::schema::validate_at recursed with no depth bound. A malicious server-advertised inputSchema could stack-overflow the load test (an uncatchable abort) once strict mode is on. Cap at MAX_SCHEMA_DEPTH = 64; over-deep tails are skipped, not failed. - MED: negative regression thresholds silently inverted the gate. Reject non-positive / non-finite values at both boundaries — a clap value_parser on the CLI flags and a finite-and-positive filter in compare_runs::regression_overrides. - LOW: compare_runs canonicalize error echoed the raw caller path back to the MCP client; drop it. - LOW: enum-violation messages rendered the full server enum array; cap the preview at 5 entries + a count. Regression tests added for the depth cap and the threshold guard. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.

Pre-publish stabilization from a read-only audit: - packaging: exclude the four CLAUDE.md scaffolding files from the published `mcp-loadtest` crate (108 -> 104 files; verified via `cargo package --list`). runs/ was already gitignored. - test gap: add `crates/mcp-loadtest-cli/tests/run_strict.rs` — the existing strict_validation.rs drives scenario.drive() directly; this covers the real production entrypoint end-to-end: `cmd_run::run_from_config` -> Config::from_file (incl. [validation] TOML deserialize) -> build_scenario -> Run::execute (run.rs strict wiring) -> emit_reports (report.md/metrics.json on disk) -> non-zero gate. Asserts both the fail (bad args -> ProtocolError -> not passed) and pass (compliant args) paths. - docs: tick AGENTS.md M7 `/security-review + /release-checks`; annotate POST_PUBLISH_ISSUES dry-run/smoke status. CHANGELOG noted. Verified on **x86_64-pc-windows-msvc** (the target crates.io ships to Windows users — previously only gnu was built locally): fmt, clippy -D warnings, build --locked, full test suite (incl. run_strict / strict_validation), doc -D warnings, publish --dry-run (lib, 104 files), release-binary smoke — all green.

`cargo deny check` (a release-checks must-pass gate) failed on the pre-publish audit: webpki-roots' CDLA-Permissive-2.0 license wasn't allowlisted, and three advisories tripped the default error. `cargo audit` exits 0 — all three are unmaintained/unsound notices, not scored vulnerabilities. Advisory checks hit the live RustSec DB, so this surfaces regardless of the (main-identical) lockfile. - licenses: allow CDLA-Permissive-2.0 (webpki-roots' Mozilla CA bundle, via tokio-tungstenite — permissive, publish-safe). - advisories: documented per-ID ignore of RUSTSEC-2025-0052 (async-std, dev-only via httpmock — never shipped), -2024-0436 (paste, transitive proc-macro), -2026-0002 (lru IterMut unsoundness — transitive via ratatui 0.29, optional `tui` feature only, no semver-compatible fix). Nothing else ignored; the gate still hard-fails on any new or actual-vulnerability advisory. Decision recorded in ADR 0011; revisit conditions tracked in POST_PUBLISH_ISSUES.md; CHANGELOG noted. `cargo deny check` now passes (advisories/bans/licenses/sources ok); `cargo audit` rc 0.

Pre-publish FULL hardening so v0.1.0's public API and crate page are safe to commit to permanently (from a market + quality/standards audit before `cargo publish`). - API durability: `#[non_exhaustive]` on the Config family + public error/outcome enums + RunContext (adding a field/variant is no longer breaking — the exact pain hit internally when `validation` was added). Constructors keep cross-crate use ergonomic: Config::new + with_*, ScenarioConfig::new, OutputConfig::new, RunContext::new. cli + 12 integration-test helpers refactored to the constructors. - Crate-root docs: dropped the stale "v0.0.0 scaffolding" status and the fictional-API example; replaced with a real compiling `no_run` example. ValidationConfig added to the re-export facade. - Standards: docs.rs all-features metadata; missing_docs = deny (lib + cli; zero gaps); README repo links → absolute so they render on crates.io. - MSRV corrected 1.86 -> 1.88 (the real floor: edition-2024 let-chains; verified by an actual build on 1.88) + new CI `msrv` job; fixed the clippy nits it surfaced (collapsible_if, is_multiple_of). Edition stays 2024 (an audit "downgrade to 2021" suggestion was wrong — let-chains require it and dry-run already accepted it). Re-verified on x86_64-pc-windows-msvc: fmt, clippy -D warnings, --locked build/test (incl. doctests), doc -D warnings, cargo deny, cargo audit, publish --dry-run (104 files). CHANGELOG noted.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7e75a2a73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T05:23:55Z

+    /// Strict validation rejected this call before it was sent: the
+    /// arguments didn't match the tool's advertised `inputSchema` and the
+    /// policy classified it as [`SchemaPolicy::Fail`]. Only produced when
+    /// `[validation] strict = true`. Maps to `CallOutcome::ProtocolError`.
+    #[error("schema violation for tool `{tool}`: {summary}")]
+    SchemaViolation {


Route schema violations through shared classifier

When [validation] strict = true is used with scenarios that still bucket HangOutcome::Err/call errors directly as CallOutcome::ServerError (I checked deadlock_probe.rs and race_check.rs), this new SchemaViolation will be reported as a server error instead of the documented ProtocolError. The run may still fail on error rate, but the metrics/report category is wrong for strict-mode schema mismatches; route those handlers through scenario::classify_error or handle this variant explicitly.

Useful? React with 👍 / 👎.

Teerapat-Vatpitak added 8 commits May 16, 2026 18:00

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Teerapat-Vatpitak merged commit aaf0172 into main May 17, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0 readiness: builder refactor + B1/B2 + full pre-publish hardening#1

v0.1.0 readiness: builder refactor + B1/B2 + full pre-publish hardening#1
Teerapat-Vatpitak merged 8 commits into
mainfrom
refactor/cmd-run-scenario-builder

Teerapat-Vatpitak commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Teerapat-Vatpitak commented May 17, 2026

What

Why a PR (not a merge)

Local verification (x86_64-pc-windows-msvc)

Out of scope (v0.2)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant