AstroReason: Coding Agent Guide

This document guides coding agents working on this repository.

Project Vision

AstroReason is a monorepo for space mission design benchmarks, reproducible experiments, and first-party method layers.

Terminology

Term	Meaning
Coding agents	Agents developing this repository
Space agents	Agents being evaluated on benchmark tasks

Do not mix these two meanings of "agent" in code or documentation.

Repository Shape

astro-reason/
├── benchmarks/   # canonical benchmark definitions and benchmark-side tooling
├── experiments/  # reproducible evaluated runs of methods against benchmarks
├── solvers/      # reusable traditional solver implementations
├── runtimes/     # reusable execution substrates for agentic systems
├── scripts/      # repo-owned orchestration and validation entrypoints
├── docs/         # public repository and contract documentation
└── tests/        # focused tests for benchmarks and repository tooling

Directory roles:

benchmarks/ owns public benchmark definitions, datasets, verifiers, generators, and optional visualizers.
experiments/ owns flat runnable experiment families, runner-owned configs, and shared prompt/config fragments under experiments/_fragments/.
solvers/ owns reusable non-agentic methods and solver-local tooling.
runtimes/ owns reusable agent runtime environments, build logic, installation steps, and shared runtime assets.

Core Rules

Keep the benchmark core algorithm-agnostic.
Keep every benchmark standalone.
Keep top-level modules standalone; do not create runtime dependencies across benchmarks/, solvers/, or runtimes/.
Preserve reproducibility for both benchmarks and method layers.
Keep public repository content understandable to external readers.

Benchmark Work

When implementing or refactoring a benchmark:

Start with the benchmark README.md.
Treat the verifier as the source of truth for validity and scoring.
Keep verifier and generator code standalone, with no imports from other benchmarks or method layers.
Update benchmark documentation, verifier behavior, tests, and generator-facing interfaces together when the benchmark contract changes.
Add or update focused tests under tests/benchmarks/.

Benchmark contract details belong in docs/benchmark_contract.md and related public contract docs, not in this guide.

Dataset rules:

Do not casually modify committed benchmark datasets.
If a dataset needs redesign or regeneration, prefer benchmark-local generator tooling.
If generator inputs depend on unstable or external sources, document that clearly in the benchmark README.

Experiment And Method Work

experiments/ is the home of runnable evaluated configurations, not reusable method implementations.

Use these ownership boundaries:

experiments decide what benchmark-facing run is performed
solvers own reusable traditional method implementations
runtimes own reusable execution environments for agentic systems
benchmarks, solvers, and runtimes must each be standalone and must not import or execute one another
experiments consume benchmarks, solvers, and runtimes through CLI/file contracts rather than source imports
solvers should implement any needed self-checks inside solver-local code instead of calling benchmark verifiers

Keep the experiments/ and runtimes/ boundary explicit:

experiments/ owns prompts, family configs, workspace assembly choices, and run settings
runtimes/ owns images, installation/build logic, copied runtime assets, and custom-built agent systems when needed

Prompt and workspace rules for space-agent-facing runs:

only expose the files needed to solve the current case
avoid benchmark, evaluation, Docker, harness, or repository-internal leakage in prompts
do not turn the solving workspace into a mirror of the whole repository

Detailed shapes, entrypoints, and CLI contracts for experiments and methods should live in docs/*_contract.md, not in this guide.

Practical Workflow

Prefer focused tests over broad test runs when working on one benchmark or tool.
Prefer readable scripts over opaque one-liners for non-trivial debugging.
Trace data flow before patching behavior.
Avoid silent fallbacks or broad exception handling that hides root causes.
Do not degrade repository design just to fit tooling or sandbox limitations.

What Not To Do

Do not create runtime dependencies between benchmarks.
Do not import internal functions, classes, or modules across benchmarks/, experiments/, solvers/, and runtimes/.
Do not make benchmarks/, solvers/, or runtimes/ call one another through CLI or executable entrypoints. Cross-layer orchestration belongs in experiments/.
Do not casually edit committed datasets by hand when a generator should own the change.
Do not leak benchmark or harness internals into space-agent prompts.
Do not install packages system-wide for repository work.

Quick Pointers

Start benchmark-specific work from the benchmark README.md.
Treat docs/benchmark_contract.md as the benchmark-side contract source of truth.
Put detailed method and experiment contracts in public docs/*_contract.md files.
Keep this guide focused on repository philosophy, ownership boundaries, and working norms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AstroReason: Coding Agent Guide

Project Vision

Terminology

Repository Shape

Core Rules

Benchmark Work

Experiment And Method Work

Practical Workflow

What Not To Do

Quick Pointers

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AstroReason: Coding Agent Guide

Project Vision

Terminology

Repository Shape

Core Rules

Benchmark Work

Experiment And Method Work

Practical Workflow

What Not To Do

Quick Pointers