Skip to content

frenzymath/shouyi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shouyi

A parallel batch-processing orchestrator for Lean 4 repositories. It distributes a TODO list of source files across multiple git worktrees, runs codex AI on each file, and optionally validates the results through configurable gates.

shouyi stands for "守一", which means 'guarding the one' or 'maintaining oneness' in traditional Chinese Taoist meditation.

Zulip channel

Setup

Overview

  1. Make sure your Lean project builds quickly by running lake exe cache get && lake build manually.
  2. Install uv, a Python package manager.
  3. Install codex, then configure lean-lsp-mcp in /path/to/your/lean/repo/.codex/config.toml as a Codex MCP server named exactly lean_lsp.
  4. Install ripgrep (rg) for local search and source scanning (lean_verify warnings).
  5. Configure shouyi

1. Prepare the target Lean repository

git clone <lean-repo-url> /path/to/your/lean/repo # skip if the repo already exists
cd /path/to/your/lean/repo
lake exe cache get && lake build

2. Install uv

Install uv for your system. On Linux/MacOS: curl -LsSf https://astral.sh/uv/install.sh | sh

3. Install codex and configure lean-lsp-mcp

install codex for your system. On Linux/MacOS: npm install -g @openai/codex

configure lean-lsp-mcp for your codex.

Important constraints for this repository:

  1. Put the Codex MCP config in the Lean repo itself: /path/to/your/lean/repo/.codex/config.toml.
  2. The MCP server name must be exactly lean_lsp, or uv run python -m scripts.doctor --repo /path/to/your/lean/repo will fail its MCP check.
cd /path/to/your/lean/repo
mkdir -p .codex
cat << EOF >> ./.codex/config.toml

[mcp_servers.lean_lsp]
command = "uvx"
args = ["lean-lsp-mcp"]
cwd = "$PWD"
startup_timeout_sec = 120
tool_timeout_sec = 180
required = true

[mcp_servers.lean_lsp.env]
LEAN_PROJECT_PATH = "$PWD"
LEAN_LOG_LEVEL = "NONE"
EOF

Validate the MCP config before continuing:

cd /path/to/your/lean/repo
codex mcp get lean_lsp

This command should succeed where enabled is true, command is uvx, and args contains lean-lsp-mcp.

4. Install ripgrep

Install ripgrep (rg) for local search and source scanning (lean_verify warnings).

5. Configure shouyi

cd /path/to/shouyi
uv sync
# 1. run doctor to verify the repo is healthy
uv run python -m scripts.doctor --repo /path/to/your/lean/repo

# 2. prepare TODO.md
# Each line should be a file path relative to the Lean repo root.
cd /path/to/your/lean/repo
rg --files Mathlib/Combinatorics/ > /path/to/shouyi/TODO.md
rg -l "erw" Mathlib > /path/to/shouyi/TODO.md
cd /path/to/shouyi

# 3. run Shouyi
# If you clone into `repos/` (e.g. `repos/mathlib4`), you can use the shorthand `--repo mathlib4` instead of typing the full path every time.
uv run python shouyi.py \
  --task "Combinatorics" \
  --repo yuanyi_mathlib4 \
  --workers 4 \
  --gates lean_compile

# Use an existing repo anywhere on disk (absolute path)
uv run python shouyi.py \
  --task "kill_some_erw" \
  --prompt "erw" \
  --repo /home/yuanyi/Documents/Lean/yuanyi_mathlib4 \
  --todo my_todo.md \
  --workers 4 \
  --gates lean_compile

CLI Options

Option Default Description
--task (required) Task name, used for log directories and result naming
--prompt golf Prompt name, loads prompts/<name>.md
--repo mathlib4 Relative path → resolved under repos/ (e.g. mathlib4repos/mathlib4).
Absolute path → used as-is, no cloning needed — point it at an existing Lean repo on disk
--todo TODO.md Path to the TODO file (one file path per line, relative to the repo root)
--workers 2 Number of parallel workers (each worker uses one git worktree)
--gates (none) Gate config name, loads prompts/<name>.json. Omit to skip all gates.

Output Locations

After a run, here's where things land:

What Path
Run logs .log/<timestamp>_<task>/ (e.g. .log/202605031200_Combinatorics/)
codex session logs .log/<timestamp>_<task>/codex/
Gate report .log/<timestamp>_<task>/gate_report.json
Accepted diffs repos/results/<repo_name>_<task>_accept/
Rejected diffs repos/results/<repo_name>_<task>_reject/
Git worktrees /path/to/your/lean/repo/../.worktrees/ (a sibling directory of the repo)
For example, if your Lean repo is located at /home/yuanyi/Documents/Lean/yuanyi_mathlib4,
then the worktree directory is located at /home/yuanyi/Documents/Lean/.worktrees/.

Each diff file is named <sanitized-file-path>.diff (e.g. Mathlib-Combinatorics-Foo.lean.diff).

Apply a diff to the repo with git apply:

git apply repos/results/yuanyi_mathlib4_kill_some_erw_accept/Some-File.lean.diff

How It Works

  1. Parse TODO.md to build a queue of file paths.
  2. Create git worktrees (or reuse existing ones) under .worktrees/ adjacent to the repo.
  3. Each worker thread takes a file from the queue, calls codex AI to process it, then runs git diff to capture changes.
  4. If gates are configured, each diff is validated (e.g. file still compiles, no def / abbrev modified).
  5. Diffs that pass all gates land in repos/results/<name>_accept/; failures land in <name>_reject/.

Gates

Gate configs are JSON files in prompts/. Example prompts/lean_compile.json:

{
  "gates": ["lean_build_file", "definition_protection"]
}

Available gates:

Gate Script Check
lean_build_file scripts/gates/gate_lean_build_file.py lake build <file> succeeds
lake_build scripts/gates/gate_lake_build.py lake build succeeds
definition_protection scripts/gates/gate_definition_protection.py No def or abbrev modified in diff
theorem_statement_required scripts/gates/gate_theorem_statement_required.py
docstring_only_changes scripts/gates/gate_docstring_only_changes.py

Project Layout

shouyi/
├── shouyi.py                  # Entry point
├── TODO.md                    # Task list (one file path per line)
├── prompts/                   # Prompt templates and gate configs
│   ├── golf.md
│   ├── erw.md
│   └── lean_compile.json
├── scripts/
│   ├── worker.py              # Single-file processing logic
│   ├── thread_manager.py      # Thread pool and queue orchestration
│   ├── codex.py               # codex CLI wrapper and MCP checks
│   ├── doctor.py              # Repo health check
│   └── gates/                 # Gate implementations
│       ├── run_gates.py
│       ├── gate_lean_build_file.py
│       ├── gate_lake_build.py
│       ├── gate_definition_protection.py
│       └── ...
├── repos/
│   ├── mathlib4/              # Default target repo (git clone here)
│   └── results/               # Diff output (accept/ + reject/)
└── .log/                      # Run logs and gate reports

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages