Goal
Make SOTA bot experiments comparable through a commanded-vs-baseline benchmark harness.
Tasks
- Define match matrix: bot, opponent, race, map, command script, baseline script, telemetry output, replay output.
- Add generated run plans for each benchmark case.
- Support command scripts for aggressive, defensive, greedy, harass, contain, and no-all-in styles.
- Normalize replay-derived metrics into report inputs.
- Produce comparison reports with fulfillment delta, adherence delta, win/loss, crash/desync markers, and command override counts.
Acceptance Criteria
- Benchmark harness can emit plans for each candidate bot without live execution.
- Harness separates commandable bots from benchmark-only opponents.
- Every generated case specifies queue path, telemetry path, replay/log path, and expected verifier metrics.
- Docs explain how to run the same case under BWAPI/BASIL/SSCAIT-style local runners.
Goal
Make SOTA bot experiments comparable through a commanded-vs-baseline benchmark harness.
Tasks
Acceptance Criteria