|
24 | 24 | - [Table of Contents](#table-of-contents) |
25 | 25 | - [Highlights](#highlights) |
26 | 26 | - [News](#news) |
| 27 | + - [Benchmark](#benchmark) |
| 28 | + - [Qualitative Results — Closed-Loop Simulation (nuPlan)](#qualitative-results--closed-loop-simulation-nuplan) |
| 29 | + - [On-Road Deployment — Night Urban Driving](#on-road-deployment--night-urban-driving) |
27 | 30 | - [System Architecture](#system-architecture) |
28 | 31 | - [Roadmap](#roadmap) |
29 | 32 | - [Getting Started](#getting-started) |
|
55 | 58 | - **[2026/04/08]** Official code repository established. Data publication under preparation. |
56 | 59 |
|
57 | 60 |
|
58 | | -<!-- ## Demonstrations |
59 | 61 |
|
60 | | -The following GIFs demonstrate WorldEngine's production-scale validation in **night urban road scenarios**: |
| 62 | +## Benchmark |
| 63 | + |
| 64 | +We compare different post-training paradigms on the nuPlan dataset, evaluating on both open-loop and closed-loop metrics across common and rare driving scenarios. |
| 65 | + |
| 66 | +> **Metric notes:** |
| 67 | +> - **Open-loop PDMS** is aligned with [NAVSIM v1.1](https://github.com/autonomousvision/navsim) PDM Score. *Common* denotes the standard `navtest` split; *Rare* denotes the `navtest_failures` subset — failure-prone rare-case scenarios extracted from `navtest`. |
| 68 | +> - **Closed-loop Success Rate** is defined as the fraction of simulated driving episodes completed without collision or off-road failure. |
| 69 | +> - **Closed-loop PDMS*** is the PDM Score obtained via SimEngine closed-loop testing, where the planner interacts with reactive agents in simulation under real-time rendering. |
| 70 | +> |
| 71 | +> **Training notes:** |
| 72 | +> - **Rare logs** are failure-prone scenarios automatically extracted from `navtrain` by the pre-trained agent itself (see [Rare Case Extraction](docs/algengine_usage.md#rare-case-extraction)). |
| 73 | +> - **Common logs** are the standard cases in `navtrain`. |
| 74 | +
|
| 75 | +| Method | Open-loop PDMS ↑ (common) | Open-loop PDMS ↑ (rare) | Closed-loop Success Rate ↑ | Closed-loop PDMS* ↑ | |
| 76 | +|:-------|:-------------------------:|:-----------------------:|:--------------------------:|:--------------------:| |
| 77 | +| Base model | 85.62 | 47.15 | 73.61 | 60.28 | |
| 78 | +| Supervised fine-tuning on rare logs | 87.03 | 49.68 | 73.26 | 62.26 | |
| 79 | +| Post-training on common logs | 86.15 | 51.49 | 64.58 | 56.66 | |
| 80 | +| Post-training on rare logs | 89.29 | 62.56 | 74.31 | 62.55 | |
| 81 | +| Post-training on rare synthetic replays | 88.01 | 56.62 | 76.39 | 62.11 | |
| 82 | +| Post-training on rare rollouts w/o Behaviour WM | 88.99 | 59.69 | 85.07 | 68.29 | |
| 83 | +| **Post-training with WorldEngine** | **88.95** | **59.83** | **88.89** | **70.12** | |
| 84 | + |
| 85 | +**Key findings:** |
| 86 | +- Post-training on **rare logs** significantly outperforms supervised fine-tuning (62.56 vs 49.68 open-loop rare PDMS), demonstrating the advantage of reward-guided optimization over imitation. |
| 87 | +- Post-training on **common logs** provides limited benefit and even degrades closed-loop performance (success rate drops from 73.61% to 64.58%), confirming that long-tail event discovery is essential. |
| 88 | +- The full **WorldEngine** pipeline achieves the best closed-loop performance (**88.89%** success rate, **70.12** PDMS*), a **+15.28%** absolute improvement in success rate over the base model. |
| 89 | + |
| 90 | +### Qualitative Results — Closed-Loop Simulation (nuPlan) |
| 91 | + |
| 92 | +Each pair shows the **Base model (FAIL)** vs **WorldEngine post-trained model (PASS)** on the same rare-case scenario. Left: front-camera rendering; Right: BEV trajectory visualization. |
61 | 93 |
|
62 | 94 | <div align="center"> |
| 95 | +<table> |
| 96 | +<tr> |
| 97 | +<td><img src="docs/imgs/nuplan_1.png" width="400px"></td> |
| 98 | +<td><img src="docs/imgs/nuplan_2.png" width="400px"></td> |
| 99 | +</tr> |
| 100 | +<tr> |
| 101 | +<td><img src="docs/imgs/nuplan_3.png" width="400px"></td> |
| 102 | +<td><img src="docs/imgs/nuplan_4.png" width="400px"></td> |
| 103 | +</tr> |
| 104 | +</table> |
| 105 | +</div> |
63 | 106 |
|
64 | | -**On-road Testing good cases (Night scenes)** |
| 107 | +### On-Road Deployment — Night Urban Driving |
65 | 108 |
|
66 | | -<img src="docs/gif/WE_road_night_01.gif" width="800px" ><br> |
67 | | -<img src="docs/gif/WE_road_night_02.gif" width="800px" ><br> |
68 | | -<img src="docs/gif/WE_road_night_03.gif" width="800px" > |
| 109 | +Zero disengagements in 200 km on-road testing on a mass-produced ADAS platform. |
69 | 110 |
|
70 | | -</div> --> |
| 111 | +<div align="center"> |
| 112 | +<img src="docs/gif/WE_road_night_01.gif" width="270px"> |
| 113 | +<img src="docs/gif/WE_road_night_02.gif" width="270px"> |
| 114 | +<img src="docs/gif/WE_road_night_03.gif" width="270px"> |
| 115 | +</div> |
71 | 116 |
|
72 | 117 |
|
73 | 118 | ## System Architecture |
@@ -289,6 +334,7 @@ We acknowledge all the open-source contributors for the following projects to ma |
289 | 334 | | [](https://github.com/nerfstudio-project/nerfstudio) | Collaboration-friendly NeRF toolkit | |
290 | 335 | | [](https://github.com/open-mmlab/mmdetection3d) | 3D detection framework | |
291 | 336 | | [](https://github.com/OpenDriveLab/UniAD) | End-to-end autonomous driving framework | |
| 337 | +| [](https://github.com/autonomousvision/navsim) | Non-reactive autonomous vehicle simulation benchmark | |
292 | 338 | | [](https://www.nuscenes.org/nuplan) | Large-scale autonomous driving dataset | |
293 | 339 | | [](https://github.com/metadriverse/metadrive) | Compositional driving simulation platform | |
294 | 340 | | [](https://github.com/ray-project/ray) | Distributed execution framework | |
|
0 commit comments