Orion Baselines

Baseline runners used to compare against Orion on three task categories.

Layout

baselines/
├── .env                       # Runtime configuration (paths + API keys). Fill in before running.
├── microscopy_baselines/      # Microscopy / CellProfiler problem-solving baselines
│   ├── biomni/                #   Biomni A1 agent runner (test_biomni.py)
│   └── react/                 #   LangChain ReAct + CellProfiler runners
│                              #     run_react.py         (training split)
│                              #     run_react_testset.py (test split)
│
├── deepresearch_benchmarks/   # Deep-research benchmark suite (MicroVQA / LabBench DbQA, LitQA2, FigQA)
│   ├── adapters/              #   Per-benchmark dataset adapters (HuggingFace loaders + answer parsing)
│   ├── runners/               #   Agent runners (Biomni, SpatialAgent)
│   ├── run_biomni_benchmark.py
│   ├── run_spatialagent_benchmark.py
│   ├── run_microvqa_llm_baselines.py   # GPT-5 / Claude / Gemini direct-LLM baselines on MicroVQA
│   ├── run_search_baselines.py         # GPT-5 / Claude with web-search tool on DbQA / LitQA2
│   └── run_script.sh                   # Convenience wrapper that runs the LLM/search baselines
│
└── JUMP_discovery/    # JUMP Cell Painting open-ended discovery task (Biomni)
    ├── run_biomni_discovery.py
    ├── prompt.md
    └── hallmark_genesets.json

Configuration

All runtime configuration lives in baselines/.env. The repository ships a template at baselines/.env.example — copy it to baselines/.env and fill in your values (the .env itself is gitignored). Every script loads that file at startup via python-dotenv and reads paths + credentials from os.environ. Fill in the values you need for the scripts you intend to run — unused variables can stay blank.

Install the loader once:

pip install python-dotenv

Variables

Variable	What to fill in
`BIOMNI_REPO_PATH`	Absolute path to a local clone of the Biomni codebase (directory containing the `biomni/` Python package).
`BIOMNI_DATA_PATH`	Absolute path to the Biomni data lake (`biomni_data` directory, ~11GB). Biomni auto-downloads on first run if missing.
`SPATIALAGENT_REPO_PATH`	Absolute path to a local clone of the SpatialAgent codebase (directory containing the `spatialagent/` Python package).
`SPATIALAGENT_DATA_PATH`	Absolute path to the SpatialAgent data directory (typically `${SPATIALAGENT_REPO_PATH}/data`).
`AWS_PROFILE`	Name of the AWS profile (configured in `~/.aws/credentials`) with Bedrock access for the Claude model. Standard AWS SDK variable.
`CELLPROFILER_CLI`	Path to the CellProfiler CLI binary. macOS default:`/Applications/CellProfiler.app/Contents/MacOS/cp`. Linux: typically `cellprofiler` on `PATH`.
`PROBLEMS_DIR`	Directory holding Biomni CellProfiler problem subfolders (each with `ProblemStatement.md` and `unit_test.py`). Used by `microscopy_baselines/biomni/test_biomni.py`.
`TRAINING_PROBLEMS_SRC`	Read-only `CellProfiler-Training` dataset.
`TRAINING_PROBLEMS_WORKDIR`	Working copy directory where the ReAct agent runs training pipelines and writes results.
`TESTING_PROBLEMS_SRC`	Read-only `CellProfiler-Testing` dataset.
`TESTING_PROBLEMS_WORKDIR`	Working copy directory where the ReAct agent runs testing pipelines and writes results.
`ANTHROPIC_API_KEY`	Claude models.
`AZURE_OPENAI_API_KEY_EUS2`	GPT-5 family via Azure (`run_microvqa_llm_baselines.py`, `run_search_baselines.py`).
`OPENAI_API_KEY`	GPT-5 deep-research (`run_search_baselines.py`).
`GEMINI_API_KEY`	Gemini models via LiteLLM (`run_microvqa_llm_baselines.py`).

Running

From the baselines/ directory (or anywhere — the scripts locate .env relative to their own file):

# Microscopy
python microscopy_baselines/biomni/test_biomni.py
python microscopy_baselines/react/run_react.py
python microscopy_baselines/react/run_react_testset.py

# Deep-research benchmarks
python deepresearch_benchmarks/run_biomni_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_spatialagent_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_microvqa_llm_baselines.py --model claude-sonnet-4-5-20250929
python deepresearch_benchmarks/run_search_baselines.py --model gpt-5 --benchmark dbqa

# JUMP discovery
python JUMP_discovery/run_biomni_discovery.py

Any missing required variable raises KeyError: '<VAR_NAME>' at startup, naming the variable to fill in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orion Baselines

Layout

Configuration

Variables

Running

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Orion Baselines

Layout

Configuration

Variables

Running