Skip to content

Commit cb72fce

Browse files
authored
Update README.md
1 parent 601f739 commit cb72fce

1 file changed

Lines changed: 1 addition & 59 deletions

File tree

README.md

Lines changed: 1 addition & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -113,27 +113,14 @@ Example (edit model & TP according to your GPUs):
113113

114114
```bash
115115
trl vllm-serve \
116-
--model Qwen/Qwen2.5-Math-1.5B \
116+
--model cbyzju/LaPHA-Math-7B-Instruct \
117117
--host 0.0.0.0 \
118118
--port 8000 \
119119
--tensor-parallel-size 1 \
120120
--max-model-len 4096
121121
```
122122

123123
### 2) Run eval
124-
125-
**Single-turn (no search):**
126-
127-
```bash
128-
ENGINE=vllm BASE_URL=http://localhost:8000 \
129-
TOKENIZER_PATH=Qwen/Qwen2.5-Math-1.5B \
130-
MODE=single \
131-
bash eval.sh math
132-
```
133-
134-
**Value-guided MCTS (recommended):**
135-
You need a value head checkpoint (`.pt`) and a base LM path for the value function.
136-
137124
```bash
138125
ENGINE=vllm BASE_URL=http://localhost:8000 \
139126
TOKENIZER_PATH=/path/to/policy_model \
@@ -149,48 +136,3 @@ Outputs:
149136

150137
* rollouts: `eval/rollouts/*.pred.jsonl`
151138
* scores: `eval/results/*.csv` (and logs under `eval/logs/`)
152-
153-
---
154-
155-
## Training (LaPha RL)
156-
157-
The training entry point is `run_dapo.py` (configured by `lapha.yaml`).
158-
159-
### 1) Prepare training data
160-
161-
The current dataloader (`helpers/math_dapo.py::dataloader`) expects a **Parquet** in a DAPO-like format
162-
(e.g., columns `prompt` and `reward_model` containing ground-truth).
163-
164-
Update the dataset path inside `run_dapo.py`:
165-
166-
```python
167-
train_dataset = dataloader_dapo('YOUR_DATA_PATH/train.parquet').shuffle()
168-
```
169-
170-
### 2) Start vLLM + tool server
171-
172-
* Start vLLM server (same as eval)
173-
* Start `rpc_python_server.py` (same as eval)
174-
175-
### 3) Launch training
176-
177-
```bash
178-
bash run_dapo.sh
179-
# or:
180-
accelerate launch --config_file deepspeed_zero3.yaml run_dapo.py --config lapha.yaml
181-
```
182-
183-
Checkpoints are saved under `output_dir` in `lapha.yaml`.
184-
185-
### 4) (Optional) Split value head for serving
186-
187-
If your checkpoint directory contains a wrapper model that includes both policy + value head, use:
188-
189-
```bash
190-
python helpers/split_valuehead.py \
191-
--src /path/to/checkpoint-XX \
192-
--out-policy /path/to/policy_model_ckptXX \
193-
--out-vhead /path/to/value_head_ckptXX.pt \
194-
--copy-tokenizer \
195-
--trust-remote-code
196-
```

0 commit comments

Comments
 (0)