Skip to content

WorldModel Gym v0.1.0

Choose a tag to compare

@biru-codeastromer biru-codeastromer released this 21 Feb 05:57
· 8 commits to main since this release

WorldModel Gym v0.1.0

Initial public release of the WorldModel Gym benchmark platform.

Highlights

  • Added three procedural long-horizon benchmark environments:
    • MemoryMaze (key-door sparse-reward POMDP)
    • SwitchQuest (hidden-order switch chaining)
    • CraftLite (resource/crafting dependency graph)
  • Added dual observation modes (rgb, symbolic, both) and stable semantic episode traces.
  • Added deterministic evaluation harness with train/test seed tracks and continual-shift track.
  • Added metrics pipeline:
    • success_rate, mean_return, median_steps_to_success
    • achievement completion counts
    • planning cost (wall_clock_ms_per_step, imagined transitions, peak memory)
    • model fidelity (k-step reward prediction error)
    • generalization gap and continual transfer metrics
  • Added planners:
    • MCTS with simulation/depth budgets and tree diagnostics
    • MPC-CEM with population/iteration/horizon budgets and score distributions
  • Added baseline agent suite:
    • random, greedy-oracle, planner-only oracle
    • imagination agent (online world model + MPC)
    • search MCTS skeleton
    • model-free PPO placeholder API
  • Added world model baselines:
    • deterministic latent model
    • stochastic latent model (RSSM-style simplified)
    • ensemble wrapper for uncertainty proxy
  • Added backend platform:
    • FastAPI runs API and uploads
    • SQLite persistence via SQLAlchemy
    • leaderboard/task/run endpoints
  • Added UI clients:
    • Next.js dashboard (home/tasks/leaderboard/run viewer)
    • Expo mobile viewer (tasks/leaderboard/run summary)
  • Added reproducibility stack:
    • Dockerfiles and docker-compose
    • CI workflow and pre-commit gates
    • Makefile automation for setup/lint/test/demo/paper
  • Added paper artifacts:
    • imported draft PDF
    • LaTeX manuscript + BibTeX references

Validation

  • make lint passes.
  • make test passes.
  • make demo runs benchmark + upload flow.
  • make paper builds paper PDF (with fallback build path when TeX toolchain is unavailable).

Breaking Changes

  • None (first public release).