WorldModel Gym v0.1.0
WorldModel Gym v0.1.0
Initial public release of the WorldModel Gym benchmark platform.
Highlights
- Added three procedural long-horizon benchmark environments:
MemoryMaze(key-door sparse-reward POMDP)SwitchQuest(hidden-order switch chaining)CraftLite(resource/crafting dependency graph)
- Added dual observation modes (
rgb,symbolic,both) and stable semantic episode traces. - Added deterministic evaluation harness with train/test seed tracks and continual-shift track.
- Added metrics pipeline:
success_rate,mean_return,median_steps_to_success- achievement completion counts
- planning cost (
wall_clock_ms_per_step, imagined transitions, peak memory) - model fidelity (k-step reward prediction error)
- generalization gap and continual transfer metrics
- Added planners:
- MCTS with simulation/depth budgets and tree diagnostics
- MPC-CEM with population/iteration/horizon budgets and score distributions
- Added baseline agent suite:
- random, greedy-oracle, planner-only oracle
- imagination agent (online world model + MPC)
- search MCTS skeleton
- model-free PPO placeholder API
- Added world model baselines:
- deterministic latent model
- stochastic latent model (RSSM-style simplified)
- ensemble wrapper for uncertainty proxy
- Added backend platform:
- FastAPI runs API and uploads
- SQLite persistence via SQLAlchemy
- leaderboard/task/run endpoints
- Added UI clients:
- Next.js dashboard (home/tasks/leaderboard/run viewer)
- Expo mobile viewer (tasks/leaderboard/run summary)
- Added reproducibility stack:
- Dockerfiles and
docker-compose - CI workflow and pre-commit gates
Makefileautomation for setup/lint/test/demo/paper
- Dockerfiles and
- Added paper artifacts:
- imported draft PDF
- LaTeX manuscript + BibTeX references
Validation
make lintpasses.make testpasses.make demoruns benchmark + upload flow.make paperbuilds paper PDF (with fallback build path when TeX toolchain is unavailable).
Breaking Changes
- None (first public release).