update README

durant42040 · durant42040 · commit 1ec90a6a6d6d · 2025-11-28T23:41:07.000+08:00
diff --git a/README.md b/README.md
@@ -194,23 +194,70 @@ The generated artifacts flow into the `DynamicDatabase`, which keeps repositorie
 
 ## Working with Agents and Trainers
 
-### Supervised Fine-Tuning (`SFTTrainer`)
+### Agents
+
+Agents orchestrate the full workflow of repository setup, training, and theorem proving. Each agent pairs a trainer with a compatible prover.
+
+#### `HFAgent`
+
+Uses Hugging Face models fine-tuned with `SFTTrainer` or `GRPOTrainer` for theorem proving. Loads checkpoints locally and uses `HFProver` for proof search. Ideal for training custom models on your traced repositories. Does not build Lean dependencies by default.
+
+```python
+from lean_dojo_v2.agent.hf_agent import HFAgent
+from lean_dojo_v2.trainer.sft_trainer import SFTTrainer
+
+trainer = SFTTrainer(model_name="deepseek-ai/DeepSeek-Prover-V2-7B", ...)
+agent = HFAgent(trainer=trainer)
+agent.setup_github_repository(url, commit)
+agent.train()  
+agent.prove()   
+```
+
+#### `ExternalAgent`
+
+Uses the Hugging Face Inference API to access large models like DeepSeek-Prover-V2-671B without local model loading. Pairs with `ExternalProver` for whole-proof generation or proof search. Best for quick experiments or when you don't have GPU resources for local inference.
+
+```python
+from lean_dojo_v2.agent.external_agent import ExternalAgent
+
+agent = ExternalAgent()
+agent.setup_github_repository(url, commit)
+agent.prove()  
+```
+
+#### `LeanAgent`
+
+Implements the lifelong learning pipeline with retrieval-augmented generation. Uses `RetrievalTrainer` to train premise retrievers, then pairs with `RetrievalProver` for retrieval-augmented tactic generation. Maintains repository curricula and builds Lean dependencies by default.
+
+```python
+from lean_dojo_v2.agent.lean_agent import LeanAgent
+
+agent = LeanAgent()
+agent.setup_github_repository(url, commit)
+agent.train()  
+agent.prove()   
+```
+
+### Trainers
+
+#### Supervised Fine-Tuning (`SFTTrainer`)
 
 - Accepts any Hugging Face causal LM identifier.
 - Supports LoRA by passing a `peft.LoraConfig`.
 - Key arguments: `epochs_per_repo`, `batch_size`, `max_seq_len`, `lr`, `warmup_steps`, `gradient_checkpointing`.
 - Produces checkpoints under `output_dir` that the `HFProver` consumes.
 
-### GRPO Trainer (`GRPOTrainer`)
+#### GRPO Trainer (`GRPOTrainer`)
 
 - Implements Group Relative Policy Optimization for reinforcement-style refinement.
 - Accepts `reference_model`, `reward_weights`, and `kl_beta` settings.
 - Useful for improving search policies on curated theorem batches.
 
-### Retrieval Trainer & LeanAgent
+#### Retrieval Trainer (`RetrievalTrainer`)
 
-- `RetrievalTrainer` trains the dense retriever that scores prior proofs.
-- `LeanAgent` wraps the trainer, maintains repository curricula, and couples it with `RetrievalProver`.
+- Trains the dense retriever that scores prior proofs from the corpus.
+- Used by `LeanAgent` to build retrieval-augmented generation models.
+- Requires indexed corpus and generator checkpoints.
 
 Each agent inherits `BaseAgent`, so you can implement your own by overriding `_get_build_deps()` and `_setup_prover()` to register new trainer/prover pairs.
 
@@ -242,6 +289,22 @@ Each agent inherits `BaseAgent`, so you can implement your own by overriding `_g
 
 ## Proving Theorems
 
+LeanDojo-v2 provides three prover implementations, each for different use cases:
+
+### `HFProver`
+
+Loads a fine-tuned Hugging Face model from a local checkpoint (supports full models and LoRA adapters) and generates tactics directly, used for locally trained Hugging Face model (e.g. with `SFTTrainer` and `GRPOTrainer`).
+
+### `ExternalProver`
+
+Performs inference with the Hugging Face Inference API to access large models without local GPU resources. Defaults to DeepSeek-Prover-V2-671B. Supports both proof search and whole-proof generation.
+
+### `RetrievalProver`
+
+Used directly with LeanAgent.
+
+### Proof Methods
+
 LeanDojo-v2 supports two methods for theorem proving:
 
 - **Whole-proof generation**: generate complete proof in one forward pass of the prover.