Skip to content

Title: Estimating Convergence Time for Doudizhu DMC Model on CPU (Current 1.09B Frames, return_0 ~ -0.06) #329

@andy2024

Description

@andy2024

Hi @daochenzha, @billh0420, @mjudell, and RLCard contributors,

I'm training a DMC model for the Doudizhu environment (action space ~30,000, state dim 602 with 3x15 matrix) on CPU (MacBook Pro M1 Pro, but using CPU mode for this run, num_actors=5). My goal is to reach top-tier performance (mean_episode_return_0 ≥ 0.20, win rate ~60% against strong opponents). Below are the latest training stats and questions:

Current Training Status (at ~1.09B frames):

[INFO:26344 trainer:367 2025-07-15 17:36:17,632] After 1091958400 frames: @ 5116.7 fps Stats:
{'loss_0': 16.268564224243164, 'loss_1': 17.48981475830078, 'mean_episode_return_0': -0.0586562342941761, 'mean_episode_return_1': 0.08355457335710526}
[INFO:26344 trainer:367 2025-07-15 17:36:22,636] After 1091977600 frames: @ 3836.7 fps Stats:
{'loss_0': 13.136116027832031, 'loss_1': 14.300411224365234, 'mean_episode_return_0': -0.05587960407137871, 'mean_episode_return_1': 0.07381904870271683}
[INFO:26344 trainer:367 2025-07-15 17:36:27,641] After 1091996800 frames: @ 3836.3 fps Stats:
{'loss_0': 14.247025489807129, 'loss_1': 14.807906150817871, 'mean_episode_return_0': -0.05250845104455948, 'mean_episode_return_1': 0.07472439110279083}
[INFO:26344 trainer:367 2025-07-15 17:36:32,647] After 1092022400 frames: @ 5114.9 fps Stats:
{'loss_0': 14.721667289733887, 'loss_1': 13.552979469299316, 'mean_episode_return_0': -0.060450516641139984, 'mean_episode_return_1': 0.08199676871299744}
[INFO:26344 trainer:367 2025-07-15 17:36:37,649] After 1092041600 frames: @ 3838.3 fps Stats:
{'loss_0': 13.114018440246582, 'loss_1': 13.420849800109863, 'mean_episode_return_0': -0.08091682195663452, 'mean_episode_return_1': 0.10203389823436737}
[INFO:26344 trainer:367 2025-07-15 17:36:42,655] After 1092067200 frames: @ 5114.9 fps Stats:
{'loss_0': 13.74947738647461, 'loss_1': 14.324023246765137, 'mean_episode_return_0': -0.07837879657745361, 'mean_episode_return_1': 0.10123736411333084}

Observations:
Loss is stable (~13-17, fluctuation ~1.5), indicating near-convergence.
mean_episode_return_0 is low (-0.08 to -0.05, win rate ~46-47.5%), far from top-tier (0.20, ~60%).
FPS averages ~4500 (but on CPU, it might be slower; logs show GPU-like FPS, possibly misconfigured).
Questions:

Convergence Time: At 1.09B frames, how many more frames are needed to reach basic convergence (return_0 ~0.05-0.10, win rate ~52.5-55%)? My estimate is 0.5-1B additional frames (~1-2 days on CPU). Is this reasonable for Doudizhu's ~30,000 action space?
Top-Tier Performance: To achieve return_0 ≥ 0.20 (win rate ~60%), how many frames are typically required? Based on DouZero papers, I estimate 2-3B frames total (1-2B more, ~2-4 days on CPU). Any insights on optimizing this?
CPU Training: Training on CPU (M1 Pro, num_actors=5), FPS ~4500 (but CPU should be slower—logs might reflect MPS). Is num_actors=5 optimal for CPU? Should I adjust for better convergence speed?
3x15 Matrix Issue: Facing bomb (4 cards) vs triplet (3 cards) confusion due to 3x15 encoding (both [1,1,1]). Considering switching to 4x15 (797-dim state). Any advice on implementing this in doudizhu.py (_cards2array) or retraining?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions