Skip to content

Latest commit

 

History

History
88 lines (73 loc) · 6.26 KB

File metadata and controls

88 lines (73 loc) · 6.26 KB

Orion Baselines

Baseline runners used to compare against Orion on three task categories.

Layout

baselines/
├── .env                       # Runtime configuration (paths + API keys). Fill in before running.
├── microscopy_baselines/      # Microscopy / CellProfiler problem-solving baselines
│   ├── biomni/                #   Biomni A1 agent runner (test_biomni.py)
│   └── react/                 #   LangChain ReAct + CellProfiler runners
│                              #     run_react.py         (training split)
│                              #     run_react_testset.py (test split)
│
├── deepresearch_benchmarks/   # Deep-research benchmark suite (MicroVQA / LabBench DbQA, LitQA2, FigQA)
│   ├── adapters/              #   Per-benchmark dataset adapters (HuggingFace loaders + answer parsing)
│   ├── runners/               #   Agent runners (Biomni, SpatialAgent)
│   ├── run_biomni_benchmark.py
│   ├── run_spatialagent_benchmark.py
│   ├── run_microvqa_llm_baselines.py   # GPT-5 / Claude / Gemini direct-LLM baselines on MicroVQA
│   ├── run_search_baselines.py         # GPT-5 / Claude with web-search tool on DbQA / LitQA2
│   └── run_script.sh                   # Convenience wrapper that runs the LLM/search baselines
│
└── JUMP_discovery/    # JUMP Cell Painting open-ended discovery task (Biomni)
    ├── run_biomni_discovery.py
    ├── prompt.md
    └── hallmark_genesets.json

Configuration

All runtime configuration lives in baselines/.env. The repository ships a template at baselines/.env.example — copy it to baselines/.env and fill in your values (the .env itself is gitignored). Every script loads that file at startup via python-dotenv and reads paths + credentials from os.environ. Fill in the values you need for the scripts you intend to run — unused variables can stay blank.

Install the loader once:

pip install python-dotenv

Variables

Variable What to fill in
BIOMNI_REPO_PATH Absolute path to a local clone of the Biomni codebase (directory containing the biomni/ Python package).
BIOMNI_DATA_PATH Absolute path to the Biomni data lake (biomni_data directory, ~11GB). Biomni auto-downloads on first run if missing.
SPATIALAGENT_REPO_PATH Absolute path to a local clone of the SpatialAgent codebase (directory containing the spatialagent/ Python package).
SPATIALAGENT_DATA_PATH Absolute path to the SpatialAgent data directory (typically ${SPATIALAGENT_REPO_PATH}/data).
AWS_PROFILE Name of the AWS profile (configured in ~/.aws/credentials) with Bedrock access for the Claude model. Standard AWS SDK variable.
CELLPROFILER_CLI Path to the CellProfiler CLI binary. macOS default:/Applications/CellProfiler.app/Contents/MacOS/cp. Linux: typically cellprofiler on PATH.
PROBLEMS_DIR Directory holding Biomni CellProfiler problem subfolders (each with ProblemStatement.md and unit_test.py). Used by microscopy_baselines/biomni/test_biomni.py.
TRAINING_PROBLEMS_SRC Read-only CellProfiler-Training dataset.
TRAINING_PROBLEMS_WORKDIR Working copy directory where the ReAct agent runs training pipelines and writes results.
TESTING_PROBLEMS_SRC Read-only CellProfiler-Testing dataset.
TESTING_PROBLEMS_WORKDIR Working copy directory where the ReAct agent runs testing pipelines and writes results.
ANTHROPIC_API_KEY Claude models.
AZURE_OPENAI_API_KEY_EUS2 GPT-5 family via Azure (run_microvqa_llm_baselines.py, run_search_baselines.py).
OPENAI_API_KEY GPT-5 deep-research (run_search_baselines.py).
GEMINI_API_KEY Gemini models via LiteLLM (run_microvqa_llm_baselines.py).

Running

From the baselines/ directory (or anywhere — the scripts locate .env relative to their own file):

# Microscopy
python microscopy_baselines/biomni/test_biomni.py
python microscopy_baselines/react/run_react.py
python microscopy_baselines/react/run_react_testset.py

# Deep-research benchmarks
python deepresearch_benchmarks/run_biomni_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_spatialagent_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_microvqa_llm_baselines.py --model claude-sonnet-4-5-20250929
python deepresearch_benchmarks/run_search_baselines.py --model gpt-5 --benchmark dbqa

# JUMP discovery
python JUMP_discovery/run_biomni_discovery.py

Any missing required variable raises KeyError: '<VAR_NAME>' at startup, naming the variable to fill in.