Baseline runners used to compare against Orion on three task categories.
baselines/
├── .env # Runtime configuration (paths + API keys). Fill in before running.
├── microscopy_baselines/ # Microscopy / CellProfiler problem-solving baselines
│ ├── biomni/ # Biomni A1 agent runner (test_biomni.py)
│ └── react/ # LangChain ReAct + CellProfiler runners
│ # run_react.py (training split)
│ # run_react_testset.py (test split)
│
├── deepresearch_benchmarks/ # Deep-research benchmark suite (MicroVQA / LabBench DbQA, LitQA2, FigQA)
│ ├── adapters/ # Per-benchmark dataset adapters (HuggingFace loaders + answer parsing)
│ ├── runners/ # Agent runners (Biomni, SpatialAgent)
│ ├── run_biomni_benchmark.py
│ ├── run_spatialagent_benchmark.py
│ ├── run_microvqa_llm_baselines.py # GPT-5 / Claude / Gemini direct-LLM baselines on MicroVQA
│ ├── run_search_baselines.py # GPT-5 / Claude with web-search tool on DbQA / LitQA2
│ └── run_script.sh # Convenience wrapper that runs the LLM/search baselines
│
└── JUMP_discovery/ # JUMP Cell Painting open-ended discovery task (Biomni)
├── run_biomni_discovery.py
├── prompt.md
└── hallmark_genesets.json
All runtime configuration lives in baselines/.env. The repository ships a
template at baselines/.env.example — copy it to baselines/.env and fill in
your values (the .env itself is gitignored). Every script loads that file
at startup via python-dotenv and reads
paths + credentials from os.environ. Fill in the values you need for the scripts
you intend to run — unused variables can stay blank.
Install the loader once:
pip install python-dotenv| Variable | What to fill in |
|---|---|
BIOMNI_REPO_PATH |
Absolute path to a local clone of the Biomni codebase (directory containing the biomni/ Python package). |
BIOMNI_DATA_PATH |
Absolute path to the Biomni data lake (biomni_data directory, ~11GB). Biomni auto-downloads on first run if missing. |
SPATIALAGENT_REPO_PATH |
Absolute path to a local clone of the SpatialAgent codebase (directory containing the spatialagent/ Python package). |
SPATIALAGENT_DATA_PATH |
Absolute path to the SpatialAgent data directory (typically ${SPATIALAGENT_REPO_PATH}/data). |
AWS_PROFILE |
Name of the AWS profile (configured in ~/.aws/credentials) with Bedrock access for the Claude model. Standard AWS SDK variable. |
CELLPROFILER_CLI |
Path to the CellProfiler CLI binary. macOS default:/Applications/CellProfiler.app/Contents/MacOS/cp. Linux: typically cellprofiler on PATH. |
PROBLEMS_DIR |
Directory holding Biomni CellProfiler problem subfolders (each with ProblemStatement.md and unit_test.py). Used by microscopy_baselines/biomni/test_biomni.py. |
TRAINING_PROBLEMS_SRC |
Read-only CellProfiler-Training dataset. |
TRAINING_PROBLEMS_WORKDIR |
Working copy directory where the ReAct agent runs training pipelines and writes results. |
TESTING_PROBLEMS_SRC |
Read-only CellProfiler-Testing dataset. |
TESTING_PROBLEMS_WORKDIR |
Working copy directory where the ReAct agent runs testing pipelines and writes results. |
ANTHROPIC_API_KEY |
Claude models. |
AZURE_OPENAI_API_KEY_EUS2 |
GPT-5 family via Azure (run_microvqa_llm_baselines.py, run_search_baselines.py). |
OPENAI_API_KEY |
GPT-5 deep-research (run_search_baselines.py). |
GEMINI_API_KEY |
Gemini models via LiteLLM (run_microvqa_llm_baselines.py). |
From the baselines/ directory (or anywhere — the scripts locate .env relative
to their own file):
# Microscopy
python microscopy_baselines/biomni/test_biomni.py
python microscopy_baselines/react/run_react.py
python microscopy_baselines/react/run_react_testset.py
# Deep-research benchmarks
python deepresearch_benchmarks/run_biomni_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_spatialagent_benchmark.py --benchmarks MicroVQA DbQA LitQA2 FigQA
python deepresearch_benchmarks/run_microvqa_llm_baselines.py --model claude-sonnet-4-5-20250929
python deepresearch_benchmarks/run_search_baselines.py --model gpt-5 --benchmark dbqa
# JUMP discovery
python JUMP_discovery/run_biomni_discovery.pyAny missing required variable raises KeyError: '<VAR_NAME>' at startup, naming
the variable to fill in.