-
Notifications
You must be signed in to change notification settings - Fork 703
Description
Hi @daochenzha, @billh0420, @mjudell, and RLCard contributors,
I'm training a DMC model for the Doudizhu environment (action space ~30,000, state dim 602 with 3x15 matrix) on CPU (MacBook Pro M1 Pro, but using CPU mode for this run, num_actors=5). My goal is to reach top-tier performance (mean_episode_return_0 ≥ 0.20, win rate ~60% against strong opponents). Below are the latest training stats and questions:
Current Training Status (at ~1.09B frames):
[INFO:26344 trainer:367 2025-07-15 17:36:17,632] After 1091958400 frames: @ 5116.7 fps Stats:
{'loss_0': 16.268564224243164, 'loss_1': 17.48981475830078, 'mean_episode_return_0': -0.0586562342941761, 'mean_episode_return_1': 0.08355457335710526}
[INFO:26344 trainer:367 2025-07-15 17:36:22,636] After 1091977600 frames: @ 3836.7 fps Stats:
{'loss_0': 13.136116027832031, 'loss_1': 14.300411224365234, 'mean_episode_return_0': -0.05587960407137871, 'mean_episode_return_1': 0.07381904870271683}
[INFO:26344 trainer:367 2025-07-15 17:36:27,641] After 1091996800 frames: @ 3836.3 fps Stats:
{'loss_0': 14.247025489807129, 'loss_1': 14.807906150817871, 'mean_episode_return_0': -0.05250845104455948, 'mean_episode_return_1': 0.07472439110279083}
[INFO:26344 trainer:367 2025-07-15 17:36:32,647] After 1092022400 frames: @ 5114.9 fps Stats:
{'loss_0': 14.721667289733887, 'loss_1': 13.552979469299316, 'mean_episode_return_0': -0.060450516641139984, 'mean_episode_return_1': 0.08199676871299744}
[INFO:26344 trainer:367 2025-07-15 17:36:37,649] After 1092041600 frames: @ 3838.3 fps Stats:
{'loss_0': 13.114018440246582, 'loss_1': 13.420849800109863, 'mean_episode_return_0': -0.08091682195663452, 'mean_episode_return_1': 0.10203389823436737}
[INFO:26344 trainer:367 2025-07-15 17:36:42,655] After 1092067200 frames: @ 5114.9 fps Stats:
{'loss_0': 13.74947738647461, 'loss_1': 14.324023246765137, 'mean_episode_return_0': -0.07837879657745361, 'mean_episode_return_1': 0.10123736411333084}
Observations:
Loss is stable (~13-17, fluctuation ~1.5), indicating near-convergence.
mean_episode_return_0 is low (-0.08 to -0.05, win rate ~46-47.5%), far from top-tier (0.20, ~60%).
FPS averages ~4500 (but on CPU, it might be slower; logs show GPU-like FPS, possibly misconfigured).
Questions:
Convergence Time: At 1.09B frames, how many more frames are needed to reach basic convergence (return_0 ~0.05-0.10, win rate ~52.5-55%)? My estimate is 0.5-1B additional frames (~1-2 days on CPU). Is this reasonable for Doudizhu's ~30,000 action space?
Top-Tier Performance: To achieve return_0 ≥ 0.20 (win rate ~60%), how many frames are typically required? Based on DouZero papers, I estimate 2-3B frames total (1-2B more, ~2-4 days on CPU). Any insights on optimizing this?
CPU Training: Training on CPU (M1 Pro, num_actors=5), FPS ~4500 (but CPU should be slower—logs might reflect MPS). Is num_actors=5 optimal for CPU? Should I adjust for better convergence speed?
3x15 Matrix Issue: Facing bomb (4 cards) vs triplet (3 cards) confusion due to 3x15 encoding (both [1,1,1]). Considering switching to 4x15 (797-dim state). Any advice on implementing this in doudizhu.py (_cards2array) or retraining?