Skip to content

Commit 6e9bb25

Browse files
committed
📝 docs: 完善 Lab 6 强化学习教程
1 parent ee64799 commit 6e9bb25

4 files changed

Lines changed: 154 additions & 84 deletions

File tree

codes/practices/quadruped/cs123/exercises/lab_6_rl_pupper/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
## 起点与 TODO map
2828

29-
教师版 `starter.py` 已经写好 PPO 配置、训练循环、渲染管线和画图。学生只补 `envs/pupper_env.py` 里三处 TODO。
29+
当前仓库保留教师版 `envs/pupper_env.py` `starter.py`,方便直接运行测试与生成素材。学生练习版只需要补环境里的三处 TODO;对应提示保留在 `starter_todo.py` TODO 1–3
3030

3131
| TODO | task | what to write |
3232
|---|---|---|
@@ -45,7 +45,7 @@
4545

4646
## MuJoCo scene
4747

48-
复用 Lab 4 / Lab 5 验证过的 `lab4/models/pupper_v3_floating.xml`(浮基 + 棋盘地板 + skybox + spotlight + tracking_cam)。不另起 MJCF。
48+
复用 Lab 4 / Lab 5 验证过的 `shared/models/pupper_v3_floating.xml`(浮基 + 棋盘地板 + skybox + spotlight + tracking_cam)。不另起 MJCF。
4949

5050
## Rubric
5151

@@ -92,5 +92,5 @@ bash shared/rl/fetch_policies.sh # 下载 test_policy.json 到
9292
uv run python lab_6_rl_pupper/tests.py # 4 条断言
9393
uv run python lab_6_rl_pupper/train_ppo.py # 30–60 min CPU 训练
9494
uv run python lab_6_rl_pupper/eval_commands.py # 加载 ckpt 录 GIF + 画图
95-
uv run python lab_6_rl_pupper/make_artifacts.py # 一键串
95+
uv run python lab_6_rl_pupper/make_artifacts.py # 一键串起训练、GIF、对比图和曲线
9696
```

codes/practices/quadruped/cs123/exercises/lab_6_rl_pupper/envs/pupper_env.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -154,10 +154,7 @@ def _get_obs(self) -> np.ndarray:
154154
gravity = base_local_gravity(self.model, self.data, self._base_id)
155155
qpos = self.data.qpos[self._joint_qpos_ids].copy()
156156
qvel = self.data.qvel[self._joint_qvel_ids].copy()
157-
foot_contact = np.array([
158-
1.0 if self.data.cfrc_ext[bid, 2] > 0.5 else 0.0
159-
for bid in self._foot_body_ids
160-
], dtype=np.float32)
157+
foot_contact = foot_contact_indicator(self.model, self.data, self._foot_body_ids)
161158
obs = np.concatenate([
162159
base_omega,
163160
gravity,

codes/practices/quadruped/cs123/exercises/lab_6_rl_pupper/make_artifacts.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""一键:train → eval → 画图 → 写 portfolio。"""
1+
"""一键:train → command GIF → comparison GIF → 画图 → 写 portfolio。"""
22

33
from __future__ import annotations
44

@@ -17,6 +17,7 @@
1717
GIF_WIDTH,
1818
PORTFOLIO_DIR,
1919
render_command_demo,
20+
render_comparison_gif,
2021
render_velocity_tracking,
2122
save_reward_curve,
2223
save_velocity_tracking,
@@ -28,7 +29,7 @@ def main() -> None:
2829
PORTFOLIO_DIR.mkdir(parents=True, exist_ok=True)
2930

3031
print("=" * 60)
31-
print("Step 1/4: PPO 训练")
32+
print("Step 1/5: PPO 训练")
3233
print("=" * 60)
3334
t0 = time.time()
3435
ckpt = train_ppo()
@@ -37,7 +38,7 @@ def main() -> None:
3738
print(f"训练完成: {train_wall / 60:.1f} min, checkpoint {ckpt_mb:.1f} MB")
3839

3940
print("=" * 60)
40-
print("Step 2/4: 录制命令序列 GIF")
41+
print("Step 2/5: 录制命令序列 GIF")
4142
print("=" * 60)
4243
frames = render_command_demo()
4344
gif_path = PORTFOLIO_DIR / "rl_pupper_commands.gif"
@@ -53,13 +54,18 @@ def main() -> None:
5354
print(f"GIF: {gif_path} ({gif_mb:.2f} MB)")
5455

5556
print("=" * 60)
56-
print("Step 3/4: 速度跟踪图")
57+
print("Step 3/5: 录制 side-by-side comparison GIF")
58+
print("=" * 60)
59+
render_comparison_gif(ckpt)
60+
61+
print("=" * 60)
62+
print("Step 4/5: 速度跟踪图")
5763
print("=" * 60)
5864
results = render_velocity_tracking()
5965
save_velocity_tracking(results)
6066

6167
print("=" * 60)
62-
print("Step 4/4: 训练曲线")
68+
print("Step 5/5: 训练曲线")
6369
print("=" * 60)
6470
save_reward_curve()
6571

0 commit comments

Comments
 (0)