Infrastructure: commanded-vs-baseline benchmark harness for SOTA bots

## Goal
Make SOTA bot experiments comparable through a commanded-vs-baseline benchmark harness.

## Tasks
- Define match matrix: bot, opponent, race, map, command script, baseline script, telemetry output, replay output.
- Add generated run plans for each benchmark case.
- Support command scripts for aggressive, defensive, greedy, harass, contain, and no-all-in styles.
- Normalize replay-derived metrics into report inputs.
- Produce comparison reports with fulfillment delta, adherence delta, win/loss, crash/desync markers, and command override counts.

## Acceptance Criteria
- Benchmark harness can emit plans for each candidate bot without live execution.
- Harness separates commandable bots from benchmark-only opponents.
- Every generated case specifies queue path, telemetry path, replay/log path, and expected verifier metrics.
- Docs explain how to run the same case under BWAPI/BASIL/SSCAIT-style local runners.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infrastructure: commanded-vs-baseline benchmark harness for SOTA bots #39

Goal

Tasks

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Infrastructure: commanded-vs-baseline benchmark harness for SOTA bots #39

Description

Goal

Tasks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions