chore(testing): make ContextForge benchmarks reproducible and CI-friendly#3680
Open
chore(testing): make ContextForge benchmarks reproducible and CI-friendly#3680
Conversation
40ff7d6 to
4849f2c
Compare
ba07cf3 to
21ac88a
Compare
21ac88a to
8dbd4b8
Compare
de19605 to
f0527ef
Compare
d82289c to
61bb02f
Compare
dima-zakharov
previously approved these changes
Apr 13, 2026
Collaborator
dima-zakharov
left a comment
There was a problem hiding this comment.
- The code is well-structured and modular
- Documentation is comprehensive
- Test coverage is extensive
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
93a69c2 to
f11753e
Compare
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a Rust-native, scenario-driven benchmark suite for ContextForge and wires it into the repo as a committed testing artifact instead of an ad hoc collection of scripts and flags.
The new benchmark flow centers on:
crates/contextforge_benchmark_runner/assets/scenarios/make benchmarkThe result is a benchmark setup that is easier to rerun, compare, extend, and eventually automate in CI.
Type of Change
What Changed
1. Added Rust benchmark crates in the workspace layout
This PR introduces three new direct workspace crates under
crates/:crates/contextforge_benchmark_runnerfor scenario loading, stack orchestration, run execution, report regeneration, and comparison outputcrates/contextforge_benchmark_consolefor the interactive TUI launcher and scenario template generatorcrates/contextforge_goosefor the Goose load driver used by benchmark scenarios2. Added committed TOML benchmark scenarios
This PR checks in a suite of benchmark scenarios under
crates/contextforge_benchmark_runner/assets/scenarios/, including coverage for:These scenario files now act as the benchmark contract for runtime, load, execution, profiling, and comparison settings.
3. Added real MCP payload fixtures
This PR adds payload fixtures under
crates/contextforge_benchmark_runner/assets/payloads/for benchmark traffic that exercises:tools/listtools/callresources/listresources/readprompts/listprompts/getThat lets the load driver benchmark real MCP JSON-RPC prompt/resource/tool paths instead of only health or admin endpoints.
4. Added a benchmark-specific container image and entrypoints
This PR adds
crates/contextforge_benchmark_runner/assets/Containerfileplus supporting entrypoint scripts so benchmark runs can build and launch a dedicated image that supports:maturingunicorn,granian,uvicorn)5. Added docs, Make target, and workspace-alignment follow-up
This PR adds benchmark documentation in
docs/docs/testing/benchmark-suite.mdand exposes the TUI launcher through:make benchmarkIt also rebases the work onto current
mainand aligns the benchmark code with the repository’s currentcrates/*Rust workspace layout instead of the oldertools_rust/placement.Why This Matters
Before this change, benchmark configuration was spread across scripts, flags, and implicit defaults. This PR moves the suite toward committed, repeatable scenarios with explicit runtime/load settings and reusable tooling around them.
That makes it easier to:
mainUsage / Verification Commands
Relevant commands introduced or documented by this PR:
Notes
reports/benchmarks/<profile>_<timestamp>/.Refs #2473