A parallel batch-processing orchestrator for Lean 4 repositories. It distributes a TODO list of source files across multiple git worktrees, runs codex AI on each file, and optionally validates the results through configurable gates.
shouyi stands for "守一", which means 'guarding the one' or 'maintaining oneness' in traditional Chinese Taoist meditation.
- Make sure your Lean project builds quickly by running
lake exe cache get && lake buildmanually. - Install uv, a Python package manager.
- Install codex, then configure lean-lsp-mcp in
/path/to/your/lean/repo/.codex/config.tomlas a Codex MCP server named exactlylean_lsp. - Install ripgrep (
rg) for local search and source scanning (lean_verifywarnings). - Configure shouyi
git clone <lean-repo-url> /path/to/your/lean/repo # skip if the repo already exists
cd /path/to/your/lean/repo
lake exe cache get && lake buildInstall uv for your system. On Linux/MacOS: curl -LsSf https://astral.sh/uv/install.sh | sh
install codex for your system. On Linux/MacOS: npm install -g @openai/codex
configure lean-lsp-mcp for your codex.
Important constraints for this repository:
- Put the Codex MCP config in the Lean repo itself:
/path/to/your/lean/repo/.codex/config.toml. - The MCP server name must be exactly
lean_lsp, oruv run python -m scripts.doctor --repo /path/to/your/lean/repowill fail its MCP check.
cd /path/to/your/lean/repo
mkdir -p .codex
cat << EOF >> ./.codex/config.toml
[mcp_servers.lean_lsp]
command = "uvx"
args = ["lean-lsp-mcp"]
cwd = "$PWD"
startup_timeout_sec = 120
tool_timeout_sec = 180
required = true
[mcp_servers.lean_lsp.env]
LEAN_PROJECT_PATH = "$PWD"
LEAN_LOG_LEVEL = "NONE"
EOFValidate the MCP config before continuing:
cd /path/to/your/lean/repo
codex mcp get lean_lspThis command should succeed where enabled is true, command is uvx, and args contains lean-lsp-mcp.
Install ripgrep (rg) for local search and source scanning (lean_verify warnings).
cd /path/to/shouyi
uv sync
# 1. run doctor to verify the repo is healthy
uv run python -m scripts.doctor --repo /path/to/your/lean/repo
# 2. prepare TODO.md
# Each line should be a file path relative to the Lean repo root.
cd /path/to/your/lean/repo
rg --files Mathlib/Combinatorics/ > /path/to/shouyi/TODO.md
rg -l "erw" Mathlib > /path/to/shouyi/TODO.md
cd /path/to/shouyi
# 3. run Shouyi
# If you clone into `repos/` (e.g. `repos/mathlib4`), you can use the shorthand `--repo mathlib4` instead of typing the full path every time.
uv run python shouyi.py \
--task "Combinatorics" \
--repo yuanyi_mathlib4 \
--workers 4 \
--gates lean_compile
# Use an existing repo anywhere on disk (absolute path)
uv run python shouyi.py \
--task "kill_some_erw" \
--prompt "erw" \
--repo /home/yuanyi/Documents/Lean/yuanyi_mathlib4 \
--todo my_todo.md \
--workers 4 \
--gates lean_compile| Option | Default | Description |
|---|---|---|
--task |
(required) | Task name, used for log directories and result naming |
--prompt |
golf |
Prompt name, loads prompts/<name>.md |
--repo |
mathlib4 |
Relative path → resolved under repos/ (e.g. mathlib4 → repos/mathlib4). Absolute path → used as-is, no cloning needed — point it at an existing Lean repo on disk |
--todo |
TODO.md |
Path to the TODO file (one file path per line, relative to the repo root) |
--workers |
2 |
Number of parallel workers (each worker uses one git worktree) |
--gates |
(none) | Gate config name, loads prompts/<name>.json. Omit to skip all gates. |
After a run, here's where things land:
| What | Path |
|---|---|
| Run logs | .log/<timestamp>_<task>/ (e.g. .log/202605031200_Combinatorics/) |
| codex session logs | .log/<timestamp>_<task>/codex/ |
| Gate report | .log/<timestamp>_<task>/gate_report.json |
| Accepted diffs | repos/results/<repo_name>_<task>_accept/ |
| Rejected diffs | repos/results/<repo_name>_<task>_reject/ |
| Git worktrees | /path/to/your/lean/repo/../.worktrees/ (a sibling directory of the repo) For example, if your Lean repo is located at /home/yuanyi/Documents/Lean/yuanyi_mathlib4, then the worktree directory is located at /home/yuanyi/Documents/Lean/.worktrees/. |
Each diff file is named <sanitized-file-path>.diff (e.g. Mathlib-Combinatorics-Foo.lean.diff).
Apply a diff to the repo with git apply:
git apply repos/results/yuanyi_mathlib4_kill_some_erw_accept/Some-File.lean.diff- Parse
TODO.mdto build a queue of file paths. - Create git worktrees (or reuse existing ones) under
.worktrees/adjacent to the repo. - Each worker thread takes a file from the queue, calls codex AI to process it, then runs
git diffto capture changes. - If gates are configured, each diff is validated (e.g. file still compiles, no
def/abbrevmodified). - Diffs that pass all gates land in
repos/results/<name>_accept/; failures land in<name>_reject/.
Gate configs are JSON files in prompts/. Example prompts/lean_compile.json:
{
"gates": ["lean_build_file", "definition_protection"]
}Available gates:
| Gate | Script | Check |
|---|---|---|
lean_build_file |
scripts/gates/gate_lean_build_file.py |
lake build <file> succeeds |
lake_build |
scripts/gates/gate_lake_build.py |
lake build succeeds |
definition_protection |
scripts/gates/gate_definition_protection.py |
No def or abbrev modified in diff |
theorem_statement_required |
scripts/gates/gate_theorem_statement_required.py |
— |
docstring_only_changes |
scripts/gates/gate_docstring_only_changes.py |
— |
shouyi/
├── shouyi.py # Entry point
├── TODO.md # Task list (one file path per line)
├── prompts/ # Prompt templates and gate configs
│ ├── golf.md
│ ├── erw.md
│ └── lean_compile.json
├── scripts/
│ ├── worker.py # Single-file processing logic
│ ├── thread_manager.py # Thread pool and queue orchestration
│ ├── codex.py # codex CLI wrapper and MCP checks
│ ├── doctor.py # Repo health check
│ └── gates/ # Gate implementations
│ ├── run_gates.py
│ ├── gate_lean_build_file.py
│ ├── gate_lake_build.py
│ ├── gate_definition_protection.py
│ └── ...
├── repos/
│ ├── mathlib4/ # Default target repo (git clone here)
│ └── results/ # Diff output (accept/ + reject/)
└── .log/ # Run logs and gate reports