Multimodal physics problem solving — caption → reason → critic.
🥇 1st Place — ICML 2025 AI for Math Workshop, Track 2: Physics Reasoning with Diagrams and Expressions
A three-stage pipeline that solves multimodal physics problems by describing the figure, deriving an answer, and then critiquing and refining that answer:
Caption (image → text) → Reason (solve) → Critic (review & correct)
Default model: Gemini-3.1-Pro on any OpenAI-compatible endpoint.
git clone https://github.com/OpenDCAI/SciReasoner.git
cd SciReasoner
pip install -e .
export OPENAI_API_KEY=<your-key>
export OPENAI_BASE_URL=<endpoint> # optional, OpenAI-compatible proxyAs a CLI:
scireasoner solve --problem "A 2 kg block slides from rest down a 30° incline of length 5 m, μ=√3/10, g=10. Find the speed at the bottom."
scireasoner solve --problem "Find I(t)." --image circuit.pngAs a Python library:
from scireasoner import solve
res = solve(problem="...", image="figure.png")
print(res.answer, res.reasoning)Inside Claude Code (one-click install):
cd plugins/claude-code/scireasoner && bash install.shInside Codex (one-click install):
cd plugins/codex/scireasoner && bash install.shBoth plugins expose three MCP tools — scireasoner_solve, scireasoner_caption, scireasoner_reason — and auto-trigger a solve-physics-problem skill when the user asks for help with a physics problem.
pip install -e ".[batch]"
hf download Kun-Xiang/SeePhysPro --repo-type dataset --local-dir ./data/SeePhysPro
python seephys_pro_codabench/scripts/run_v2.py \
--run v2_pub830 --split testmini --levels level1 level2 level3 level4 level5 \
--caption-model gemini-3.1-pro-preview --reason-model gemini-3.1-pro-preview \
--critic-model gemini-3.1-pro-preview --use-critic --k-samples 1 --workers 50
python seephys_pro_codabench/scripts/audit_fix.py --run output/v2_pub830
# Upload submission_audited.zip to https://www.codabench.org/competitions/16010/Per-stage caches under output/<run>/cache/ make crash-resume automatic — re-run to continue.
ICML 2025 SeePhys Challenge — 🥇 1st Place.
SeePhys Pro 2026 (Codabench 16010, public testmini) — current best:
| # | Submission | Overall | L1 | L2 | L3 | L4 | L5 |
|---|---|---|---|---|---|---|---|
| 1 | gemini baseline | 0.7651 | 0.770 | 0.810 | 0.755 | 0.700 | 0.933 |
| 4 | + L4 verbatim caption | 0.7747 | 0.765 | 0.810 | 0.760 | 0.740 | 0.933 |
| 8 | + L1 few-shot reason | 0.7771 | 0.770 | 0.800 | 0.765 | 0.750 | 0.933 |
Full iteration log: seephys_pro_codabench/output/submissions/README.md.
scireasoner/ Python package (CLI + MCP server, thin shell)
plugins/claude-code/ Claude Code one-click install
plugins/codex/ Codex one-click install
seephys_pro_codabench/ Active 2026 competition workspace (don't touch)
caption.py / answer.py Original 2025 single-script implementation
The scireasoner/ package imports stages directly from seephys_pro_codabench/scripts/run_v2.py without copying — so any improvement we push during the live competition flows to all downstream users on the next git pull.
@article{liang2025multimodal,
title = {Multimodal Reasoning for Science: Technical Report and 1st Place
Solution to the ICML 2025 SeePhys Challenge},
author = {Liang, Hao and Wu, Ruitao and Zeng, Bohan and Niu, Junbo
and Zhang, Wentao and Dong, Bin},
journal= {arXiv preprint arXiv:2509.06079},
year = {2025}
}GPL-3.0. See LICENSE. Thanks to the ICML 2025 AI for Math Workshop organizers and Codabench / AWS for hosting.