Skip to content

Commit 9b4de05

Browse files
chore: harden workflow, linting, and fallback build paths
1 parent c29bbe8 commit 9b4de05

File tree

27 files changed

+342
-69
lines changed

27 files changed

+342
-69
lines changed

CHANGELOG.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,18 @@ All notable changes to this project will be documented in this file.
55
The format is based on Keep a Changelog and this project follows Semantic Versioning.
66

77
## [Unreleased]
8-
- Initial scaffold in progress.
8+
- Pending release metadata and follow-up benchmark extensions.
9+
10+
## [0.1.0] - 2026-02-21
11+
### Added
12+
- Monorepo scaffold with Python packages for `core`, `agents`, `planners`, `worldmodels`, and `server`.
13+
- Three procedural long-horizon environments: MemoryMaze, SwitchQuest, and CraftLite.
14+
- Stable episode trace schema, deterministic evaluation harness, CLI runner, and continual track metrics.
15+
- Planner implementations: MCTS, MPC-CEM, and trajectory sampling baselines.
16+
- World model baselines: deterministic latent, stochastic latent, and ensemble uncertainty wrapper.
17+
- Agent baselines: random, oracle, planner-oracle, imagination MPC, search MCTS skeleton, and PPO placeholder.
18+
- FastAPI server with runs CRUD, artifact uploads, leaderboard queries, tasks listing, and trace/metrics downloads.
19+
- Next.js dashboard with pages for home/tasks/leaderboard/run viewer.
20+
- Expo mobile viewer for task, leaderboard, and run summary browsing.
21+
- Docker compose stack, demo upload script, CI workflow, and pre-commit tooling.
22+
- Paper artifacts: imported draft PDF, LaTeX manuscript recreation, BibTeX references, and build workflow.

Makefile

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,39 @@
1-
PYTHON ?= python3
2-
PIP ?= pip3
1+
VENV ?= .venv
2+
PYTHON ?= $(VENV)/bin/python
3+
PIP ?= $(VENV)/bin/pip
34

45
.PHONY: setup test lint demo paper
56

67
setup:
8+
python3 -m venv $(VENV)
9+
$(PIP) install --upgrade pip
710
$(PIP) install -r requirements-dev.txt
811
$(PIP) install -e core -e planners -e worldmodels -e agents -e server
912
cd web && npm install
1013
cd mobile && npm install
1114

1215
test:
13-
pytest
16+
$(PYTHON) -m pytest
1417

1518
lint:
16-
ruff check .
17-
ruff format --check .
19+
$(PYTHON) -m ruff check .
20+
$(PYTHON) -m ruff format --check .
1821

1922
demo:
20-
docker compose up -d --build
23+
@if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then \
24+
docker compose up -d --build; \
25+
else \
26+
echo "Docker daemon unavailable; starting local API fallback on :8000"; \
27+
mkdir -p .tmp; \
28+
$(PYTHON) -m uvicorn worldmodel_server.main:app --host 127.0.0.1 --port 8000 > .tmp/demo-server.log 2>&1 & \
29+
echo $$! > .tmp/demo-server.pid; \
30+
sleep 2; \
31+
fi
2132
$(PYTHON) scripts/demo_run.py
33+
@if [ -f .tmp/demo-server.pid ]; then \
34+
kill `cat .tmp/demo-server.pid` || true; \
35+
rm -f .tmp/demo-server.pid; \
36+
fi
2237

2338
paper:
2439
$(MAKE) -C paper paper

README.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,52 @@
11
# worldmodel-gym
22

3-
WorldModel Gym is a long-horizon planning benchmark for imagination-based agents.
3+
WorldModel Gym is a reproducible long-horizon planning benchmark + evaluation platform for imagination-based agents.
44

5-
## 30-second demo
5+
## Quickstart (30 seconds)
66

77
```bash
88
make setup
99
make demo
1010
```
1111

12-
Then open:
12+
`make demo` will:
13+
- start the API + web stack with Docker when available
14+
- fall back to local API execution when Docker daemon is unavailable
15+
- run one benchmark evaluation
16+
- upload artifacts and populate leaderboard data
1317

14-
- Web UI: <http://localhost:3000>
15-
- API docs: <http://localhost:8000/docs>
18+
Open:
19+
- [http://localhost:3000](http://localhost:3000) (web dashboard)
20+
- [http://localhost:8000/docs](http://localhost:8000/docs) (FastAPI docs)
1621

17-
## Run one evaluation
22+
## Run a single evaluation
1823

1924
```bash
20-
python -m worldmodel_gym.eval.run --agent random --env memory_maze --track test --seeds 101,102 --max-episodes 2
25+
.venv/bin/python -m worldmodel_gym.eval.run \
26+
--agent random \
27+
--env memory_maze \
28+
--track test \
29+
--seeds 211,223 \
30+
--max-episodes 2
2131
```
2232

23-
Artifacts are written to `runs/<run_id>/`.
33+
Artifacts are written to `runs/<run_id>/`:
34+
- `metrics.json`
35+
- `trace.jsonl`
36+
- `config.yaml`
2437

25-
## Developer targets
38+
## Monorepo layout
39+
40+
- `core/`: environments, traces, eval harness
41+
- `planners/`: MCTS, MPC-CEM, trajectory sampling
42+
- `worldmodels/`: deterministic/stochastic/ensemble latent models
43+
- `agents/`: baseline agents and registry
44+
- `server/`: FastAPI leaderboard + run artifact service
45+
- `web/`: Next.js dashboard
46+
- `mobile/`: Expo viewer
47+
- `paper/`: draft PDF + LaTeX sources
48+
49+
## Dev targets
2650

2751
```bash
2852
make lint

agents/pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ requires-python = ">=3.11"
99
dependencies = [
1010
"numpy>=1.26.4",
1111
"torch>=2.3.1",
12-
"stable-baselines3>=2.5.0",
1312
]
1413

1514
[tool.setuptools.packages.find]

agents/worldmodel_agents/imagination_mpc.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,19 @@
33
import copy
44

55
import numpy as np
6+
from worldmodel_models.registry import create_world_model
7+
from worldmodel_planners.mpc_cem import MPCCEMPlanner
68

79
from worldmodel_agents.base import AgentConfig, BaseAgent
8-
from worldmodel_planners.mpc_cem import MPCCEMPlanner
9-
from worldmodel_models.registry import create_world_model
1010

1111

1212
class ImaginationMPCAgent(BaseAgent):
1313
def __init__(self, config: AgentConfig | None = None):
1414
super().__init__(config=config)
1515
self.world_model = create_world_model("ensemble")
16-
self.planner = MPCCEMPlanner(action_space_n=self.config.action_space_n, horizon=10, population=64, iterations=3)
16+
self.planner = MPCCEMPlanner(
17+
action_space_n=self.config.action_space_n, horizon=10, population=64, iterations=3
18+
)
1719
self.latent = self.world_model.init_state(batch_size=1)
1820
self.buffer: list[dict] = []
1921
self.rng = np.random.default_rng(0)

agents/worldmodel_agents/oracle_agent.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22

33
import copy
44

5-
from worldmodel_agents.base import AgentConfig, BaseAgent
65
from worldmodel_planners.mcts import MCTSPlanner
76

7+
from worldmodel_agents.base import AgentConfig, BaseAgent
8+
89

910
def _move_toward(agent_pos: list[int], target_pos: list[int]) -> int:
1011
ar, ac = agent_pos
@@ -64,7 +65,9 @@ def act(self, obs, info: dict) -> int:
6465
class PlannerOnlyOracleAgent(BaseAgent):
6566
def __init__(self, config: AgentConfig | None = None):
6667
super().__init__(config=config)
67-
self.planner = MCTSPlanner(action_space_n=self.config.action_space_n, num_simulations=48, max_depth=18)
68+
self.planner = MCTSPlanner(
69+
action_space_n=self.config.action_space_n, num_simulations=48, max_depth=18
70+
)
6871

6972
def act(self, obs, info: dict) -> int:
7073
del obs

agents/worldmodel_agents/search_mcts.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,21 @@
33
import copy
44

55
import numpy as np
6-
7-
from worldmodel_agents.base import AgentConfig, BaseAgent
86
from worldmodel_models.registry import create_world_model
97
from worldmodel_planners.mcts import MCTSPlanner
108

9+
from worldmodel_agents.base import AgentConfig, BaseAgent
10+
1111

1212
class SearchMCTSAgent(BaseAgent):
1313
"""Minimal MuZero-style skeleton: learned model + MCTS planning."""
1414

1515
def __init__(self, config: AgentConfig | None = None):
1616
super().__init__(config=config)
1717
self.world_model = create_world_model("deterministic")
18-
self.planner = MCTSPlanner(action_space_n=self.config.action_space_n, num_simulations=56, max_depth=14)
18+
self.planner = MCTSPlanner(
19+
action_space_n=self.config.action_space_n, num_simulations=56, max_depth=14
20+
)
1921
self.latent = self.world_model.init_state(batch_size=1)
2022
self.buffer: list[dict] = []
2123
self.rng = np.random.default_rng(0)
@@ -32,7 +34,9 @@ def act(self, obs, info: dict) -> int:
3234
self.latent = self.world_model.observe(self.latent, obs)
3335

3436
def transition_fn(state, action):
35-
next_state, _pred_obs, pred_reward, pred_done, _aux = self.world_model.predict(state, int(action))
37+
next_state, _pred_obs, pred_reward, pred_done, _aux = self.world_model.predict(
38+
state, int(action)
39+
)
3640
return next_state, float(pred_reward), bool(pred_done)
3741

3842
result = self.planner.plan(

core/worldmodel_gym/envs/base.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from __future__ import annotations
22

3-
from dataclasses import dataclass
43
import copy
4+
from dataclasses import dataclass
55
from typing import Any
66

77
import gymnasium as gym
@@ -46,7 +46,9 @@ def __init__(self, config: BaseEnvConfig):
4646
self.action_space = spaces.Discrete(8)
4747
self._rng = np.random.default_rng(0)
4848

49-
symbolic_space = spaces.Box(low=0.0, high=1.0, shape=(16, self.grid_size, self.grid_size), dtype=np.float32)
49+
symbolic_space = spaces.Box(
50+
low=0.0, high=1.0, shape=(16, self.grid_size, self.grid_size), dtype=np.float32
51+
)
5052
rgb_space = spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8)
5153

5254
if self.config.obs_mode == "rgb":

core/worldmodel_gym/envs/memory_maze.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,12 @@ def _step_state(self, action: int) -> tuple[float, bool, list[str]]:
7171
self.has_key = True
7272
events.append("found_key")
7373

74-
if action == 5 and self.has_key and self._adjacent(self.agent_pos, self.door_pos) and not self.door_open:
74+
if (
75+
action == 5
76+
and self.has_key
77+
and self._adjacent(self.agent_pos, self.door_pos)
78+
and not self.door_open
79+
):
7580
self.door_open = True
7681
events.append("opened_door")
7782

core/worldmodel_gym/eval/continual.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,18 @@ class ContinualSchedule:
99
shift_strength: float = 0.05
1010

1111

12-
def apply_shift_kwargs(base_kwargs: dict, env_id: str, shift_idx: int, shift_strength: float) -> dict:
12+
def apply_shift_kwargs(
13+
base_kwargs: dict, env_id: str, shift_idx: int, shift_strength: float
14+
) -> dict:
1315
kwargs = dict(base_kwargs)
1416
if env_id == "memory_maze":
15-
kwargs["wall_density"] = min(0.35, kwargs.get("wall_density", 0.16) + shift_idx * shift_strength)
17+
kwargs["wall_density"] = min(
18+
0.35, kwargs.get("wall_density", 0.16) + shift_idx * shift_strength
19+
)
1620
elif env_id == "switch_quest":
17-
kwargs["wall_density"] = min(0.25, kwargs.get("wall_density", 0.1) + shift_idx * shift_strength)
21+
kwargs["wall_density"] = min(
22+
0.25, kwargs.get("wall_density", 0.1) + shift_idx * shift_strength
23+
)
1824
elif env_id == "craft_lite":
1925
kwargs["rock_count"] = int(kwargs.get("rock_count", 5) + shift_idx)
2026
return kwargs

0 commit comments

Comments
 (0)