v0.1.0 readiness: builder refactor + B1/B2 + full pre-publish hardening#1
Conversation
cmd_run.rs grew past the 300 prod-LoC convention while the in-flight refactor unified single-tool, weighted-pattern and legacy tool_call configs through one build_scenario factory and added M5-M7 kind dispatch. Split into private builder/params/patterns submodules. The public surface (run_from_config + the re-exported parse_dur_str shared with deadlock-probe and cross) is unchanged and now guarded by a cmd_run integration test. CHANGELOG and the custom-scenario doc are synced to the new layout. ci-checks green: fmt, clippy -D warnings, build --locked, 260+ tests, doc -D warnings.
ADR 0009 deferred operator-configurable thresholds until a user asked; this lands them. p99 / error-rate / deadlock-zero-tolerance were hard-coded constants shared by the compare CLI and the compare_runs MCP tool, so a user wanting a different CI budget had to fork. Add analysis::regression::RegressionThresholds whose Default reproduces the historical 10% / 0.5pp / zero-tolerance policy exactly, so existing gates are unaffected unless they opt in. Surface overrides where operators work: compare gains --max-p99-regression-pct / --max-error-rate-regression-pp / --allow-deadlock-increase; the compare_runs tool gains the matching optional args. CLI and MCP stay in lockstep by construction (same struct). ADR 0009 updated (open question resolved); CHANGELOG noted. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.
The protocol-aware-assertions differentiator wanted the tester to catch when a tools/call's args don't match the server's advertised inputSchema, but ADR 0005 deliberately chose lenient/forward-compat parsing and the repo's supply-chain discipline rules out an unauditable JSON Schema crate. Add `[validation] strict = true` (default false → behaviour unchanged): - protocol::schema, a dependency-free subset validator (type/properties/required/enum/items) that skips unmodeled keywords, so it never rejects on forward-compat grounds. - classify_schema_violation: args violations Fail (gate as ProtocolError), result violations Warn (result-side deferred to v0.2), none Ignore. - Session caches the tools/list registry the run already fetches (no extra round-trip, ADR 0006 hot path untouched when off) and rejects bad args before send via SessionError::SchemaViolation, mapped to CallOutcome::ProtocolError. ADR 0010 added; CHANGELOG noted. New mock-schema.py fixture + strict_validation.rs prove bad args gate and compliant args don't. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.
B1/B2 shipped real differentiators (configurable regression thresholds, opt-in protocol-aware schema validation) but the README and DESIGN matrix didn't surface them, so a reader couldn't tell the tool is a CI regression gate, not just a profiler. Add a "CI gating & protocol-aware assertions" section to the README (threshold gating, configurable compare, strict mode, link to the ci-integration cookbook) and two rows each to the README and DESIGN.md §10.5 competitive matrices. Docs only — no code change; doc -D warnings green.
Pre-publish security review of the configurable-thresholds and strict-validation work surfaced one HIGH and three lower findings, all in code added this branch: - HIGH: protocol::schema::validate_at recursed with no depth bound. A malicious server-advertised inputSchema could stack-overflow the load test (an uncatchable abort) once strict mode is on. Cap at MAX_SCHEMA_DEPTH = 64; over-deep tails are skipped, not failed. - MED: negative regression thresholds silently inverted the gate. Reject non-positive / non-finite values at both boundaries — a clap value_parser on the CLI flags and a finite-and-positive filter in compare_runs::regression_overrides. - LOW: compare_runs canonicalize error echoed the raw caller path back to the MCP client; drop it. - LOW: enum-violation messages rendered the full server enum array; cap the preview at 5 entries + a count. Regression tests added for the depth cap and the threshold guard. ci-checks green: fmt, clippy -D warnings, --locked tests, doc.
Pre-publish stabilization from a read-only audit: - packaging: exclude the four CLAUDE.md scaffolding files from the published `mcp-loadtest` crate (108 -> 104 files; verified via `cargo package --list`). runs/ was already gitignored. - test gap: add `crates/mcp-loadtest-cli/tests/run_strict.rs` — the existing strict_validation.rs drives scenario.drive() directly; this covers the real production entrypoint end-to-end: `cmd_run::run_from_config` -> Config::from_file (incl. [validation] TOML deserialize) -> build_scenario -> Run::execute (run.rs strict wiring) -> emit_reports (report.md/metrics.json on disk) -> non-zero gate. Asserts both the fail (bad args -> ProtocolError -> not passed) and pass (compliant args) paths. - docs: tick AGENTS.md M7 `/security-review + /release-checks`; annotate POST_PUBLISH_ISSUES dry-run/smoke status. CHANGELOG noted. Verified on **x86_64-pc-windows-msvc** (the target crates.io ships to Windows users — previously only gnu was built locally): fmt, clippy -D warnings, build --locked, full test suite (incl. run_strict / strict_validation), doc -D warnings, publish --dry-run (lib, 104 files), release-binary smoke — all green.
`cargo deny check` (a release-checks must-pass gate) failed on the pre-publish audit: webpki-roots' CDLA-Permissive-2.0 license wasn't allowlisted, and three advisories tripped the default error. `cargo audit` exits 0 — all three are unmaintained/unsound notices, not scored vulnerabilities. Advisory checks hit the live RustSec DB, so this surfaces regardless of the (main-identical) lockfile. - licenses: allow CDLA-Permissive-2.0 (webpki-roots' Mozilla CA bundle, via tokio-tungstenite — permissive, publish-safe). - advisories: documented per-ID ignore of RUSTSEC-2025-0052 (async-std, dev-only via httpmock — never shipped), -2024-0436 (paste, transitive proc-macro), -2026-0002 (lru IterMut unsoundness — transitive via ratatui 0.29, optional `tui` feature only, no semver-compatible fix). Nothing else ignored; the gate still hard-fails on any new or actual-vulnerability advisory. Decision recorded in ADR 0011; revisit conditions tracked in POST_PUBLISH_ISSUES.md; CHANGELOG noted. `cargo deny check` now passes (advisories/bans/licenses/sources ok); `cargo audit` rc 0.
Pre-publish FULL hardening so v0.1.0's public API and crate page are safe to commit to permanently (from a market + quality/standards audit before `cargo publish`). - API durability: `#[non_exhaustive]` on the Config family + public error/outcome enums + RunContext (adding a field/variant is no longer breaking — the exact pain hit internally when `validation` was added). Constructors keep cross-crate use ergonomic: Config::new + with_*, ScenarioConfig::new, OutputConfig::new, RunContext::new. cli + 12 integration-test helpers refactored to the constructors. - Crate-root docs: dropped the stale "v0.0.0 scaffolding" status and the fictional-API example; replaced with a real compiling `no_run` example. ValidationConfig added to the re-export facade. - Standards: docs.rs all-features metadata; missing_docs = deny (lib + cli; zero gaps); README repo links → absolute so they render on crates.io. - MSRV corrected 1.86 -> 1.88 (the real floor: edition-2024 let-chains; verified by an actual build on 1.88) + new CI `msrv` job; fixed the clippy nits it surfaced (collapsible_if, is_multiple_of). Edition stays 2024 (an audit "downgrade to 2021" suggestion was wrong — let-chains require it and dry-run already accepted it). Re-verified on x86_64-pc-windows-msvc: fmt, clippy -D warnings, --locked build/test (incl. doctests), doc -D warnings, cargo deny, cargo audit, publish --dry-run (104 files). CHANGELOG noted.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7e75a2a73
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Strict validation rejected this call before it was sent: the | ||
| /// arguments didn't match the tool's advertised `inputSchema` and the | ||
| /// policy classified it as [`SchemaPolicy::Fail`]. Only produced when | ||
| /// `[validation] strict = true`. Maps to `CallOutcome::ProtocolError`. | ||
| #[error("schema violation for tool `{tool}`: {summary}")] | ||
| SchemaViolation { |
There was a problem hiding this comment.
Route schema violations through shared classifier
When [validation] strict = true is used with scenarios that still bucket HangOutcome::Err/call errors directly as CallOutcome::ServerError (I checked deadlock_probe.rs and race_check.rs), this new SchemaViolation will be reported as a server error instead of the documented ProtocolError. The run may still fail on error rate, but the metrics/report category is wrong for strict-mode schema mismatches; route those handlers through scenario::classify_error or handle this variant explicitly.
Useful? React with 👍 / 👎.
What
v0.1.0 readiness: completes the in-flight scenario-builder refactor, closes the two differentiator gaps, then a full pre-publish security + stabilization + API-durability hardening pass. 8 commits (
b6d7165…b7e75a2).b6d7165refactor(cli)cmd_runintobuilder/params/patterns(<300 prod LoC each)ba0faaefeat(compare)compareflags +compare_runsMCP args (ADR 0009)10443aafeat(validation)c9089a5docsad2e5c6fix(security)95783e6chore(release)run_stricte2e test (realrun --configpath)fec989echore(deny)CDLA-Permissive-2.0; triaged RUSTSEC ignores (ADR 0011)b7e75a2refactor(api)#[non_exhaustive]+ constructors on the public surface; crate-root docs; docs.rs all-features;missing_docs=deny; MSRV 1.86→1.88 (verified)Net public-API change is pre-release hardening only;
Cargo.lockis identical tomain(zero dependency changes across all 8 commits).Why a PR (not a merge)
This PR exists to run the authoritative CI backstop — 3-OS (
ubuntu/macOS/windows-latest)checks+cargo-audit+ the newmsrv(Rust 1.88) job — including the toolchains that couldn't be exercised locally. Not publishing; no tag; no crates.io.Local verification (x86_64-pc-windows-msvc)
fmt·clippy -D warnings·build --locked· fulltestincl. doctests ·doc -D warnings·cargo deny check·cargo audit(rc 0) ·publish --dry-run -p mcp-loadtest(104 files) — all green. SeeCHANGELOG.md[Unreleased]for the full detail and ADRs 0009–0011 for the decisions.Out of scope (v0.2)
Result-side schema validation, host-allowlist SSRF, distributed mode, revisiting the triaged RUSTSEC ignores — tracked in
POST_PUBLISH_ISSUES.md.