chore(testing): make ContextForge benchmarks reproducible and CI-friendly by lucarlig · Pull Request #3680 · IBM/mcp-context-forge

lucarlig · 2026-03-14T12:01:42Z

Summary

This PR adds a Rust-native, scenario-driven benchmark suite for ContextForge and wires it into the repo as a committed testing artifact instead of an ad hoc collection of scripts and flags.

The new benchmark flow centers on:

committed TOML scenarios under crates/contextforge_benchmark_runner/assets/scenarios/
a Rust benchmark runner for validate/run/report/compare workflows
an interactive TUI launcher exposed via make benchmark
a Goose-based load driver that can hit real REST and MCP JSON-RPC paths
a benchmark container image with optional Rust plugin wheels and profiling tools

The result is a benchmark setup that is easier to rerun, compare, extend, and eventually automate in CI.

Type of Change

What Changed

1. Added Rust benchmark crates in the workspace layout

This PR introduces three new direct workspace crates under crates/:

crates/contextforge_benchmark_runner for scenario loading, stack orchestration, run execution, report regeneration, and comparison output
crates/contextforge_benchmark_console for the interactive TUI launcher and scenario template generator
crates/contextforge_goose for the Goose load driver used by benchmark scenarios

2. Added committed TOML benchmark scenarios

This PR checks in a suite of benchmark scenarios under crates/contextforge_benchmark_runner/assets/scenarios/, including coverage for:

A2A invoke flows
admin plugin inventory
REST discovery
MCP protocol and MCP runtime paths
MCP prompts, resources, and tools
rate limiter behavior and scaling
secret detection
spin detector and delay-oriented cases
other runtime comparison and smoke-oriented suites

These scenario files now act as the benchmark contract for runtime, load, execution, profiling, and comparison settings.

3. Added real MCP payload fixtures

This PR adds payload fixtures under crates/contextforge_benchmark_runner/assets/payloads/ for benchmark traffic that exercises:

tools/list
tools/call
resources/list
resources/read
prompts/list
prompts/get

That lets the load driver benchmark real MCP JSON-RPC prompt/resource/tool paths instead of only health or admin endpoints.

4. Added a benchmark-specific container image and entrypoints

This PR adds crates/contextforge_benchmark_runner/assets/Containerfile plus supporting entrypoint scripts so benchmark runs can build and launch a dedicated image that supports:

optional Rust plugin wheel builds via maturin
optional profiling tools
multiple HTTP server modes (gunicorn, granian, uvicorn)
the benchmark assets needed by the runner and load driver

5. Added docs, Make target, and workspace-alignment follow-up

This PR adds benchmark documentation in docs/docs/testing/benchmark-suite.md and exposes the TUI launcher through:

make benchmark

It also rebases the work onto current main and aligns the benchmark code with the repository’s current crates/* Rust workspace layout instead of the older tools_rust/ placement.

Why This Matters

Before this change, benchmark configuration was spread across scripts, flags, and implicit defaults. This PR moves the suite toward committed, repeatable scenarios with explicit runtime/load settings and reusable tooling around them.

That makes it easier to:

rerun a benchmark consistently
compare runs across branches or images
extend the suite by authoring scenario files instead of bespoke scripts
keep new Rust benchmark code in the same workspace model used by current main
capture richer benchmark/report artifacts for future regression tracking

Usage / Verification Commands

Relevant commands introduced or documented by this PR:

make benchmark
cargo run --manifest-path crates/contextforge_benchmark_runner/Cargo.toml -- validate --scenario rust-mcp-runtime-300
cargo run --manifest-path crates/contextforge_benchmark_runner/Cargo.toml -- run --scenario a2a-invoke-300 --smoke
cargo run --manifest-path crates/contextforge_benchmark_runner/Cargo.toml -- run --scenario rust-mcp-runtime-300
cargo run --manifest-path crates/contextforge_benchmark_runner/Cargo.toml -- regenerate-report --run-dir reports/benchmarks/<run-dir>
cargo run --manifest-path crates/contextforge_benchmark_runner/Cargo.toml -- compare-run --run-dir reports/benchmarks/<run-dir>
cargo test -p contextforge_benchmark_runner -p contextforge_benchmark_console -p contextforge_goose
uv run pytest tests/unit/test_rust_workspace_layout.py

Notes

This PR is broader than a TOML-config cleanup. It adds the runner, launcher, load driver, benchmark image, scenarios, payload fixtures, docs, and the repo entrypoint for using them.
Several scenarios compare benchmark images with and without optional Rust plugin artifacts. Those suites are not claiming that the underlying product runtime has moved wholesale to Rust; they are measuring the benchmarked paths described in each scenario.
Benchmark reports are written under reports/benchmarks/<profile>_<timestamp>/.

Refs #2473

dima-zakharov

The code is well-structured and modular
Documentation is comprehensive
Test coverage is extensive

Signed-off-by: lucarlig <luca.carlig@ibm.com>

lucarlig force-pushed the chore/build-prod-image-with-profilers branch 3 times, most recently from 40ff7d6 to 4849f2c Compare March 16, 2026 09:14

lucarlig added triage Issues / Features awaiting triage experimental Experimental features, test proposed MCP Specification changes chore Linting, formatting, dependency hygiene, or project maintenance chores labels Mar 16, 2026

lucarlig force-pushed the chore/build-prod-image-with-profilers branch 4 times, most recently from ba07cf3 to 21ac88a Compare March 16, 2026 09:31

lucarlig changed the title ~~Chore/build prod image with profilers~~ Feature/deterministic benchamrks Mar 16, 2026

lucarlig changed the title ~~Feature/deterministic benchamrks~~ Feature: Make ContextForge benchmarks reproducible and CI-friendly Mar 16, 2026

lucarlig force-pushed the chore/build-prod-image-with-profilers branch from 21ac88a to 8dbd4b8 Compare March 16, 2026 11:08

crivetimihai added this to the Release 1.2.0 milestone Mar 20, 2026

crivetimihai added the COULD P3: Nice-to-have features with minimal impact if left out; included if time permits label Mar 20, 2026

crivetimihai changed the title ~~Feature: Make ContextForge benchmarks reproducible and CI-friendly~~ chore(testing): make ContextForge benchmarks reproducible and CI-friendly Mar 20, 2026

lucarlig force-pushed the chore/build-prod-image-with-profilers branch 2 times, most recently from de19605 to f0527ef Compare March 26, 2026 10:45

lucarlig force-pushed the chore/build-prod-image-with-profilers branch 2 times, most recently from d82289c to 61bb02f Compare April 8, 2026 11:01

lucarlig marked this pull request as ready for review April 8, 2026 14:30

lucarlig requested review from crivetimihai and dima-zakharov as code owners April 8, 2026 14:30

dima-zakharov previously approved these changes Apr 13, 2026

View reviewed changes

lucarlig added 6 commits April 15, 2026 15:09

feat: add benchmark suite and ui

d62f8b2

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: move contextforge benchmarks fully to rust

5fb45a5

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: consolidate rust benchmark tooling

7b5a354

Signed-off-by: lucarlig <luca.carlig@ibm.com>

docs: add benchmark console multiview design

e11b2e1

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: improve rust benchmark console UX

9f4bd7d

Signed-off-by: lucarlig <luca.carlig@ibm.com>

chore: remove superpowers docs

1ba181b

Signed-off-by: lucarlig <luca.carlig@ibm.com>

lucarlig added 6 commits April 15, 2026 15:09

docs: align benchmark scenario descriptions

3766e49

Signed-off-by: lucarlig <luca.carlig@ibm.com>

docs: align benchmark scenario descriptions

271ce76

Signed-off-by: lucarlig <luca.carlig@ibm.com>

docs: fix benchmark suite contract

48b436c

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: modularize benchmark tooling and add suite parity

1dab626

Signed-off-by: lucarlig <luca.carlig@ibm.com>

fix: address benchmark review findings

ecd7f90

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: move benchmark crates into workspace layout

f11753e

Signed-off-by: lucarlig <luca.carlig@ibm.com>

lucarlig dismissed dima-zakharov’s stale review via f11753e April 15, 2026 14:18

lucarlig force-pushed the chore/build-prod-image-with-profilers branch from 93a69c2 to f11753e Compare April 15, 2026 14:18

lucarlig requested review from brian-hussey, dawid-nowak, kevalmahajan and madhav165 as code owners April 15, 2026 14:18

lucarlig added 2 commits April 15, 2026 15:40

fix: address benchmark CI failures

0f4a78c

Signed-off-by: lucarlig <luca.carlig@ibm.com>

fix: satisfy workspace clippy

391a798

Signed-off-by: lucarlig <luca.carlig@ibm.com>

lucarlig marked this pull request as draft April 15, 2026 15:23

lucarlig marked this pull request as ready for review April 15, 2026 15:23

chore: retrigger ci

7013efa

Signed-off-by: lucarlig <luca.carlig@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(testing): make ContextForge benchmarks reproducible and CI-friendly#3680

chore(testing): make ContextForge benchmarks reproducible and CI-friendly#3680
lucarlig wants to merge 15 commits intomainfrom
chore/build-prod-image-with-profilers

lucarlig commented Mar 14, 2026 •

edited

Loading

Uh oh!

dima-zakharov left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lucarlig commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

What Changed

1. Added Rust benchmark crates in the workspace layout

2. Added committed TOML benchmark scenarios

3. Added real MCP payload fixtures

4. Added a benchmark-specific container image and entrypoints

5. Added docs, Make target, and workspace-alignment follow-up

Why This Matters

Usage / Verification Commands

Notes

Uh oh!

dima-zakharov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucarlig commented Mar 14, 2026 •

edited

Loading