You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/algengine_usage.md
+44-7Lines changed: 44 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,6 +157,39 @@ work_dirs/e2e_vadv2_50pct/
157
157
└── navtest_failures.csv # rare navtest cases only
158
158
```
159
159
160
+
#### Full Train Set Evaluation
161
+
162
+
Evaluate on the full training set (navtrain) to produce per-scenario metrics for [Rare Case Extraction](#rare-case-extraction). Because navtrain is large, the script splits it into chunks to avoid OOM.
163
+
164
+
```bash
165
+
conda activate algengine
166
+
cd projects/AlgEngine
167
+
168
+
# Chunked evaluation on navtrain (8 GPUs, 20 chunks)
169
+
bash scripts/e2e_dist_eval_navtrain_chunked.sh \
170
+
configs/worldengine/e2e_vadv2_50pct.py \
171
+
work_dirs/e2e_vadv2_50pct/epoch.pth \
172
+
8 \
173
+
20
174
+
```
175
+
176
+
**Arguments:**
177
+
1.`<config>`: Configuration file path
178
+
2.`<checkpoint>`: Model checkpoint to evaluate
179
+
3.`<num_gpus>`: Number of GPUs to use
180
+
4.`[num_chunks]` (optional, default 10): Number of chunks to split navtrain into
181
+
182
+
The script automatically:
183
+
1. Splits `navtrain.yaml` into chunks under `configs/navsim_splits/navtrain_split/chunks/`
184
+
2. Evaluates each chunk sequentially
185
+
3. Merges all chunk CSVs into a single file
186
+
187
+
**Output:**
188
+
```
189
+
experiments/worldengine/e2e_vadv2_50pct/
190
+
└── navtrain.csv # Full train set evaluation results
Extract failure scenarios from evaluation results for targeted fine-tuning.
206
239
240
+
### Prerequisites
241
+
242
+
Before extracting rare cases, you **must** complete a [Full Train Set Evaluation](#full-train-set-evaluation) to generate `navtrain.csv` with per-scenario metrics. The rare case extraction script uses this CSV to identify failure scenarios.
0 commit comments