bug fixed

WCJ-BERT · WCJ-BERT · commit e7e080b1e5ae · 2026-04-07T13:43:39.000Z
diff --git a/README.md b/README.md
@@ -24,6 +24,9 @@
   - [Table of Contents](#table-of-contents)
   - [Highlights](#highlights)
   - [News](#news)
+  - [Benchmark](#benchmark)
+    - [Qualitative Results — Closed-Loop Simulation (nuPlan)](#qualitative-results--closed-loop-simulation-nuplan)
+    - [On-Road Deployment — Night Urban Driving](#on-road-deployment--night-urban-driving)
   - [System Architecture](#system-architecture)
   - [Roadmap](#roadmap)
   - [Getting Started](#getting-started)
@@ -55,19 +58,61 @@
 - **[2026/04/08]** Official code repository established. Data publication under preparation.
 
 
-<!-- ## Demonstrations
 
-The following GIFs demonstrate WorldEngine's production-scale validation in **night urban road scenarios**:
+## Benchmark
+
+We compare different post-training paradigms on the nuPlan dataset, evaluating on both open-loop and closed-loop metrics across common and rare driving scenarios.
+
+> **Metric notes:**
+> - **Open-loop PDMS** is aligned with [NAVSIM v1.1](https://github.com/autonomousvision/navsim) PDM Score. *Common* denotes the standard `navtest` split; *Rare* denotes the `navtest_failures` subset — failure-prone rare-case scenarios extracted from `navtest`.
+> - **Closed-loop Success Rate** is defined as the fraction of simulated driving episodes completed without collision or off-road failure.
+> - **Closed-loop PDMS*** is the PDM Score obtained via SimEngine closed-loop testing, where the planner interacts with reactive agents in simulation under real-time rendering.
+>
+> **Training notes:**
+> - **Rare logs** are failure-prone scenarios automatically extracted from `navtrain` by the pre-trained agent itself (see [Rare Case Extraction](docs/algengine_usage.md#rare-case-extraction)). 
+> - **Common logs** are the standard cases in `navtrain`.
+
+| Method | Open-loop PDMS ↑ (common) | Open-loop PDMS ↑ (rare) | Closed-loop Success Rate ↑ | Closed-loop PDMS* ↑ |
+|:-------|:-------------------------:|:-----------------------:|:--------------------------:|:--------------------:|
+| Base model | 85.62 | 47.15 | 73.61 | 60.28 |
+| Supervised fine-tuning on rare logs | 87.03 | 49.68 | 73.26 | 62.26 |
+| Post-training on common logs | 86.15 | 51.49 | 64.58 | 56.66 |
+| Post-training on rare logs | 89.29 | 62.56 | 74.31 | 62.55 |
+| Post-training on rare synthetic replays | 88.01 | 56.62 | 76.39 | 62.11 |
+| Post-training on rare rollouts w/o Behaviour WM | 88.99 | 59.69 | 85.07 | 68.29 |
+| **Post-training with WorldEngine** | **88.95** | **59.83** | **88.89** | **70.12** |
+
+**Key findings:**
+- Post-training on **rare logs** significantly outperforms supervised fine-tuning (62.56 vs 49.68 open-loop rare PDMS), demonstrating the advantage of reward-guided optimization over imitation.
+- Post-training on **common logs** provides limited benefit and even degrades closed-loop performance (success rate drops from 73.61% to 64.58%), confirming that long-tail event discovery is essential.
+- The full **WorldEngine** pipeline achieves the best closed-loop performance (**88.89%** success rate, **70.12** PDMS*), a **+15.28%** absolute improvement in success rate over the base model.
+
+### Qualitative Results — Closed-Loop Simulation (nuPlan)
+
+Each pair shows the **Base model (FAIL)** vs **WorldEngine post-trained model (PASS)** on the same rare-case scenario. Left: front-camera rendering; Right: BEV trajectory visualization.
 
 <div align="center">
+<table>
+<tr>
+<td><img src="docs/imgs/nuplan_1.png" width="400px"></td>
+<td><img src="docs/imgs/nuplan_2.png" width="400px"></td>
+</tr>
+<tr>
+<td><img src="docs/imgs/nuplan_3.png" width="400px"></td>
+<td><img src="docs/imgs/nuplan_4.png" width="400px"></td>
+</tr>
+</table>
+</div>
 
-**On-road Testing good cases (Night scenes)**
+### On-Road Deployment — Night Urban Driving
 
-<img src="docs/gif/WE_road_night_01.gif" width="800px" ><br>
-<img src="docs/gif/WE_road_night_02.gif" width="800px" ><br>
-<img src="docs/gif/WE_road_night_03.gif" width="800px" >
+Zero disengagements in 200 km on-road testing on a mass-produced ADAS platform.
 
-</div> -->
+<div align="center">
+<img src="docs/gif/WE_road_night_01.gif" width="270px">
+<img src="docs/gif/WE_road_night_02.gif" width="270px">
+<img src="docs/gif/WE_road_night_03.gif" width="270px">
+</div>
 
 
 ## System Architecture
@@ -289,6 +334,7 @@ We acknowledge all the open-source contributors for the following projects to ma
 | [![nerfstudio](https://img.shields.io/badge/nerfstudio-NeRF_Framework-green?style=flat-square&logo=github)](https://github.com/nerfstudio-project/nerfstudio) | Collaboration-friendly NeRF toolkit |
 | [![MMDetection3D](https://img.shields.io/badge/MMDetection3D-3D_Detection-orange?style=flat-square&logo=github)](https://github.com/open-mmlab/mmdetection3d) | 3D detection framework |
 | [![UniAD](https://img.shields.io/badge/UniAD-End--to--End_AD-red?style=flat-square&logo=github)](https://github.com/OpenDriveLab/UniAD) | End-to-end autonomous driving framework |
+| [![NAVSIM](https://img.shields.io/badge/NAVSIM-AD_Benchmark-teal?style=flat-square&logo=github)](https://github.com/autonomousvision/navsim) | Non-reactive autonomous vehicle simulation benchmark |
 | [![nuPlan](https://img.shields.io/badge/nuPlan-Dataset-purple?style=flat-square&logo=github)](https://www.nuscenes.org/nuplan) | Large-scale autonomous driving dataset |
 | [![MetaDrive](https://img.shields.io/badge/MetaDrive-Driving_Simulation-ff69b4?style=flat-square&logo=github)](https://github.com/metadriverse/metadrive) | Compositional driving simulation platform |
 | [![Ray](https://img.shields.io/badge/Ray-Distributed_Computing-yellow?style=flat-square&logo=ray)](https://github.com/ray-project/ray) | Distributed execution framework |
diff --git a/docs/data_organization.md b/docs/data_organization.md
@@ -25,19 +25,17 @@ We provide pre-processed datasets and model checkpoints via **ModelScope** and *
 
 ### Option 1: Download from ModelScope (Recommended for Users in China)
 
-​```bash
-    
-    pip install modelscope
-    modelscope download --dataset OpenDriveLab/WorldEngine
-​```
+```bash
+pip install modelscope
+modelscope download --dataset OpenDriveLab/WorldEngine
+```
 
 ### Option 2: Download from Hugging Face (Coming Soon)
 
-​```bash
-
-    pip install huggingface-hub
-    hf download OpenDriveLab/WorldEngine
-​```
+```bash
+pip install huggingface-hub
+hf download OpenDriveLab/WorldEngine
+```
 
 > **Note:** Dataset and model URLs will be announced once the official release is ready. Stay tuned to our [News section](../README.md#-news).
 
diff --git a/docs/imgs/nuplan_1.png b/docs/imgs/nuplan_1.png
diff --git a/docs/imgs/nuplan_2.png b/docs/imgs/nuplan_2.png
diff --git a/docs/imgs/nuplan_3.png b/docs/imgs/nuplan_3.png
diff --git a/docs/imgs/nuplan_4.png b/docs/imgs/nuplan_4.png
diff --git a/docs/simengine_usage.md b/docs/simengine_usage.md
@@ -18,7 +18,7 @@ This guide covers how to use SimEngine for closed-loop simulation, rollout gener
 
 ```bash
 
-cd $SIMENGINE_ROOT
+cd projects/SimEngine
 
 # Single-GPU testing
 bash scripts/run_testing.sh <config> <checkpoint> <model_name> <data_type> <react_type> [<asset_name>]
diff --git a/projects/AlgEngine/closed_loop/sim_test.py b/projects/AlgEngine/closed_loop/sim_test.py
@@ -24,6 +24,7 @@
 from closed_loop.post_processor import ScorePostProcessor
 
 warnings.filterwarnings("ignore")
+WORLDENGINE_ROOT = os.getenv('WORLDENGINE_ROOT', os.path.abspath('.'))
 
 def parse_args():
     parser = argparse.ArgumentParser(
@@ -238,7 +239,7 @@ def main():
         sim_cfg = ConfigDict(dict(
             max_wait_time=50,
             maximum_step=100000000,
-            post_process_path='test_8192_kmeans.npy',
+            post_process_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
             plan_save_path='',
             merged_ann_save_dir = '',
             monitored_folder = ''
diff --git a/projects/AlgEngine/configs/navformer/e2e_hydramdp.py b/projects/AlgEngine/configs/navformer/e2e_hydramdp.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/navformer/e2e_vadv2.py b/projects/AlgEngine/configs/navformer/e2e_vadv2.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_100pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_100pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_13pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_13pct.py
@@ -243,7 +243,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_25pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_25pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_ilft_rare_log.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_ilft_rare_log.py
@@ -250,7 +250,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_common_log.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_common_log.py
@@ -257,7 +257,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_log.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_log.py
@@ -258,7 +258,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_rollout.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_rollout.py
@@ -261,7 +261,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_rollout_bwm.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_rollout_bwm.py
@@ -264,7 +264,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_syn_replay.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_50pct_rlft_rare_syn_replay.py
@@ -261,7 +261,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_60pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_60pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_70pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_70pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_80pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_80pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/AlgEngine/configs/worldengine/e2e_vadv2_90pct.py b/projects/AlgEngine/configs/worldengine/e2e_vadv2_90pct.py
@@ -244,7 +244,7 @@
         num_poses=40,
         d_ffn=256 * 4,
         d_model=256,
-        vocab_path='test_8192_kmeans.npy',
+        vocab_path=os.path.join(WORLDENGINE_ROOT, "data/alg_engine/test_8192_kmeans.npy"),
         nhead=8,
         nlayers=1,
         num_commands=4,
diff --git a/projects/SimEngine/worldengine/components/agents/policy/human_policy.py b/projects/SimEngine/worldengine/components/agents/policy/human_policy.py
@@ -26,7 +26,7 @@ def __init__(self, agent, random_seed=None, config=None):
         self.raw_human_traj = np.load(
             os.path.join(
                 WE_root,
-                "test_8192_kmeans.npy"
+                "data/alg_engine/test_8192_kmeans.npy"
             )
         )
         self.rear_axle_to_center = 1.461
diff --git a/projects/SimEngine/worldengine/manager/dense_reward_manager.py b/projects/SimEngine/worldengine/manager/dense_reward_manager.py
@@ -90,7 +90,7 @@ def __init__(self):
         self.buffer_size = self.engine.global_config['reward_buffer_size']
 
         self.route_roadblock_dict = {}
-        gt_array = np.load('test_8192_kmeans.npy')
+        gt_array = np.load(os.path.join(WE_root, 'data/alg_engine/test_8192_kmeans.npy'))
         self.gt_array = gt_array[:,:self.engine.global_config['reward_sampling_poses'] * 5,:]
         self.pkl_paths_df = pd.DataFrame(columns=["token", "step", "pkl_path"])
         # check cuda availability
@@ -100,7 +100,7 @@ def __init__(self):
         self.raw_human_traj = np.load(
             os.path.join(
                 WE_root,
-                "test_8192_kmeans.npy"
+                "data/alg_engine/test_8192_kmeans.npy"
             )
         )
         self.rear_axle_to_center = 1.461