Skip to content

Infrastructure: commanded-vs-baseline benchmark harness for SOTA bots #39

Description

@minsing-jin

Goal

Make SOTA bot experiments comparable through a commanded-vs-baseline benchmark harness.

Tasks

  • Define match matrix: bot, opponent, race, map, command script, baseline script, telemetry output, replay output.
  • Add generated run plans for each benchmark case.
  • Support command scripts for aggressive, defensive, greedy, harass, contain, and no-all-in styles.
  • Normalize replay-derived metrics into report inputs.
  • Produce comparison reports with fulfillment delta, adherence delta, win/loss, crash/desync markers, and command override counts.

Acceptance Criteria

  • Benchmark harness can emit plans for each candidate bot without live execution.
  • Harness separates commandable bots from benchmark-only opponents.
  • Every generated case specifies queue path, telemetry path, replay/log path, and expected verifier metrics.
  • Docs explain how to run the same case under BWAPI/BASIL/SSCAIT-style local runners.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions