update README

durant42040 · durant42040 · commit 3bf5c7127375 · 2025-11-28T23:24:22.000+08:00
diff --git a/README.md b/README.md
@@ -12,10 +12,10 @@ LeanDojo-v2 is an end-to-end framework for training, evaluating, and deploying A
 4. [Requirements](#requirements)
 5. [Installation](#installation)
 6. [Environment Setup](#environment-setup)
-7. [Quick Start](#quickstart)
+7. [Quick Start](#quick-start)
 8. [Working with Agents and Trainers](#working-with-agents-and-trainers)
 9. [Tracing and Dataset Generation](#tracing-and-dataset-generation)
-10. [External APIs and LeanCopilot](#external-apis-and-leancopilot)
+10. [LeanProgress Step-Prediction](#leanprogress-step-prediction)
 11. [Testing](#testing)
 12. [Troubleshooting & Tips](#troubleshooting--tips)
 13. [Contributing](#contributing)
@@ -113,13 +113,15 @@ pip install torch torchvision torchaudio --index-url https://download.pytorch.or
 
 1. **GitHub Access Token (required)**  
    The tracing pipeline calls the GitHub API extensively. Create a personal access token and export it before running any agent:
+
    ```sh
-   export GITHUB_ACCESS_TOKEN=<your-token>
+   export GITHUB_ACCESS_TOKEN=<token>
    ```
 
 2. **Hugging Face Token (optional but needed for gated models)**  
+
    ```sh
-   export HF_TOKEN=<your-hf-token>
+   export HF_TOKEN=<hf-token>
    ```
 
 3. **Working directories**  
@@ -160,32 +162,6 @@ This example:
 3. Fine-tunes the specified Hugging Face model (optionally with LoRA).
 4. Launches an `HFProver` backed by Pantograph to search for proofs.
 
----
-
-## Working with Agents and Trainers
-
-### Supervised Fine-Tuning (`SFTTrainer`)
-
-- Accepts any Hugging Face causal LM identifier.
-- Supports LoRA by passing a `peft.LoraConfig`.
-- Key arguments: `epochs_per_repo`, `batch_size`, `max_seq_len`, `lr`, `warmup_steps`, `gradient_checkpointing`.
-- Produces checkpoints under `output_dir` that the `HFProver` consumes.
-
-### GRPO Trainer (`GRPOTrainer`)
-
-- Implements Group Relative Policy Optimization for reinforcement-style refinement.
-- Accepts `reference_model`, `reward_weights`, and `kl_beta` settings.
-- Useful for improving search policies on curated theorem batches.
-
-### Retrieval Trainer & LeanAgent
-
-- `RetrievalTrainer` trains the dense retriever that scores prior proofs.
-- `LeanAgent` wraps the trainer, maintains repository curricula, and couples it with `RetrievalProver`.
-
-Each agent inherits `BaseAgent`, so you can implement your own by overriding `_get_build_deps()` and `_setup_prover()` to register new trainer/prover pairs.
-
----
-
 ## Tracing and Dataset Generation
 
 The `lean_dojo_v2/lean_dojo/data_extraction` package powers repository tracing:
@@ -215,43 +191,81 @@ database.setup_github_repository(
 
 The generated artifacts flow into the `DynamicDatabase`, which keeps repositories sorted by difficulty and appends new sorrys without retracing everything.
 
----
+## Working with Agents and Trainers
+
+### Supervised Fine-Tuning (`SFTTrainer`)
+
+- Accepts any Hugging Face causal LM identifier.
+- Supports LoRA by passing a `peft.LoraConfig`.
+- Key arguments: `epochs_per_repo`, `batch_size`, `max_seq_len`, `lr`, `warmup_steps`, `gradient_checkpointing`.
+- Produces checkpoints under `output_dir` that the `HFProver` consumes.
 
-## External APIs and LeanCopilot
+### GRPO Trainer (`GRPOTrainer`)
 
-`lean_dojo_v2/external_api` contains Lean and Python code to expose models through LeanCopilot:
+- Implements Group Relative Policy Optimization for reinforcement-style refinement.
+- Accepts `reference_model`, `reward_weights`, and `kl_beta` settings.
+- Useful for improving search policies on curated theorem batches.
 
-- `LeanCopilot.lean` registers RPC endpoints inside Lean.
-- `python/server.py` hosts a FastAPI service with adapters for Anthropic, OpenAI, Google Generative AI, vLLM, and custom HF models.
-- Start the service with:
-  ```sh
-  cd lean_dojo_v2/external_api/python
-  pip install -r requirements.txt
-  uvicorn server:app --port 23337
-  ```
-- Point your Lean client to the running server to interactively request tactics, proofs, or completions from external models.
+### Retrieval Trainer & LeanAgent
 
-### LeanProgress Step-Prediction Workflow
+- `RetrievalTrainer` trains the dense retriever that scores prior proofs.
+- `LeanAgent` wraps the trainer, maintains repository curricula, and couples it with `RetrievalProver`.
+
+Each agent inherits `BaseAgent`, so you can implement your own by overriding `_get_build_deps()` and `_setup_prover()` to register new trainer/prover pairs.
+
+## LeanProgress Step-Prediction
 
 - Generate a JSONL dataset with remaining-step targets (or replace it with your own LeanProgress export):
+
   ```sh
   python -m lean_dojo_v2.lean_progress.create_sample_dataset --output raid/data/sample_leanprogress_dataset.jsonl
   ```
+
 - Fine-tune a regression head that predicts `steps_remaining`:
-  ```sh
-  python -m lean_dojo_v2.lean_progress.train_steps_model \
-    --dataset raid/data/sample_leanprogress_dataset.jsonl \
-    --output-dir raid/checkpoints/leanprogress_steps \
-    --model-name bert-base-uncased
+
+  ```python
+  from pathlib import Path
+
+  from lean_dojo_v2.trainer.progress_trainer import ProgressTrainer
+
+  sample_dataset_path = Path("raid/data/sample_leanprogress_dataset.jsonl")
+
+  trainer = ProgressTrainer(
+      model_name="bert-base-uncased",
+      data_path=str(sample_dataset_path),
+      output_dir="outputs-progress",
+  )
+
+  trainer.train()
   ```
-- Tell the LeanCopilot server where to find the checkpoint by exporting:
-  ```sh
-  export LEANPROGRESS_MODEL=raid/checkpoints/leanprogress_steps
-  uvicorn server:app --port 23337
+
+## Proving Theorems
+
+LeanDojo-v2 supports two methods for theorem proving:
+
+- **Whole-proof generation**: generate complete proof in one forward pass of the prover.
+
+  ```python
+  from lean_dojo_v2.prover import ExternalProver
+
+  theorem = "theorem my_and_comm : ∀ {p q : Prop}, And p q → And q p := by"
+  prover = ExternalProver()
+  proof = prover.generate_whole_proof(theorem)
   ```
-- Add `use_reward=true` when calling `/generate`. Each output now includes `steps_remaining` and a reward value (currently `-steps_remaining`) so agents can minimize proof length.
 
----
+- **Proof search**: generate tactics sequentially and update the goal state through interaction with Pantograph until the proof is complete.
+
+  ```python
+  from pantograph.server import Server
+  from lean_dojo_v2.prover import HFProver
+
+  server = Server()
+  prover = HFProver(ckpt_path="outputs-deepseek")
+
+  result, used_tactics = prover.search(
+      server=server, goal="∀ {p q : Prop}, p ∧ q → q ∧ p", verbose=False
+  )
+  ```
 
 ## Testing
 
@@ -264,8 +278,6 @@ export HF_TOKEN=<hf-token>     # only required for tests touching HF APIs
 pytest -v
 ```
 
----
-
 ## Troubleshooting & Tips
 
 - **401 Bad Credentials / rate limits**: Ensure `GITHUB_ACCESS_TOKEN` is exported and has `repo` + `read:org` scopes.
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "lean-dojo-v2"
-version = "1.0.3"
+version = "1.0.4"
 description = "A comprehensive library for AI-assisted theorem proving in Lean"
 readme = "README.md"
 license = {text = "MIT"}