|
| 1 | +# WorldModel Gym v0.1.0 |
| 2 | + |
| 3 | +Initial public release of the WorldModel Gym benchmark platform. |
| 4 | + |
| 5 | +## Highlights |
| 6 | +- Added three procedural long-horizon benchmark environments: |
| 7 | + - `MemoryMaze` (key-door sparse-reward POMDP) |
| 8 | + - `SwitchQuest` (hidden-order switch chaining) |
| 9 | + - `CraftLite` (resource/crafting dependency graph) |
| 10 | +- Added dual observation modes (`rgb`, `symbolic`, `both`) and stable semantic episode traces. |
| 11 | +- Added deterministic evaluation harness with train/test seed tracks and continual-shift track. |
| 12 | +- Added metrics pipeline: |
| 13 | + - `success_rate`, `mean_return`, `median_steps_to_success` |
| 14 | + - achievement completion counts |
| 15 | + - planning cost (`wall_clock_ms_per_step`, imagined transitions, peak memory) |
| 16 | + - model fidelity (k-step reward prediction error) |
| 17 | + - generalization gap and continual transfer metrics |
| 18 | +- Added planners: |
| 19 | + - MCTS with simulation/depth budgets and tree diagnostics |
| 20 | + - MPC-CEM with population/iteration/horizon budgets and score distributions |
| 21 | +- Added baseline agent suite: |
| 22 | + - random, greedy-oracle, planner-only oracle |
| 23 | + - imagination agent (online world model + MPC) |
| 24 | + - search MCTS skeleton |
| 25 | + - model-free PPO placeholder API |
| 26 | +- Added world model baselines: |
| 27 | + - deterministic latent model |
| 28 | + - stochastic latent model (RSSM-style simplified) |
| 29 | + - ensemble wrapper for uncertainty proxy |
| 30 | +- Added backend platform: |
| 31 | + - FastAPI runs API and uploads |
| 32 | + - SQLite persistence via SQLAlchemy |
| 33 | + - leaderboard/task/run endpoints |
| 34 | +- Added UI clients: |
| 35 | + - Next.js dashboard (home/tasks/leaderboard/run viewer) |
| 36 | + - Expo mobile viewer (tasks/leaderboard/run summary) |
| 37 | +- Added reproducibility stack: |
| 38 | + - Dockerfiles and `docker-compose` |
| 39 | + - CI workflow and pre-commit gates |
| 40 | + - `Makefile` automation for setup/lint/test/demo/paper |
| 41 | +- Added paper artifacts: |
| 42 | + - imported draft PDF |
| 43 | + - LaTeX manuscript + BibTeX references |
| 44 | + |
| 45 | +## Validation |
| 46 | +- `make lint` passes. |
| 47 | +- `make test` passes. |
| 48 | +- `make demo` runs benchmark + upload flow. |
| 49 | +- `make paper` builds paper PDF (with fallback build path when TeX toolchain is unavailable). |
| 50 | + |
| 51 | +## Breaking Changes |
| 52 | +- None (first public release). |
0 commit comments