Improvements in training by alejandromarcu · Pull Request #329 · jonbinney/deep_rabbit_hole

alejandromarcu · 2026-01-28T21:14:02Z

Logging more to wandb:

policy, value and total loss
time for training, sampling and time waiting for data to train
model lag: ie.how many models ago the self playing started. If all were 0, we're basically doing epochs. If we use a lot of workers, the number will increase. E.g. a model lag of 5 means that we started self-play in a process with a model 5 versions older than the model that is current once the self-play finished.

Other:

games_per_training_step: this parameter controls the ratio between games played and training steps. The training will wait in order to keep the ratio. If we run in a machine with a lot of processes, training may not be able to catch up. We'll cross that bridge when we get to it.

Experiments in Hypercloud

I used an 8xA6000 to see how it performs the base B5W3 experiment.
I run with different number of workers and parallel games and got this:

The runs are:

cucu-28a-8xA6000: 240 workers, 8 parallel games
cucu-28a-8xA6000-120: 120 workers, 8 parallel games
cucu-28a-8xA6000-120-16: 120 workers, 16 parallel games
cucu-28a-8xA6000-60-16: 60 workers, 16 parallel games

The one with 240 games took more than 3 minutes to start completing games, and the slope is about the same than the 120 workers, so it seems that there's no advantage on using so many workers. I can see that the GPUs are at 99%,. so that's the bottleneck.
The one that performed the best is with 120 workers and 16 parallel games, but just a bit better than 60 workers. I'll try increasing the number of parallel games anyway. But in any case, I'd expect the setup to be highly dependent on the board size, mcts_n, network config, etc, so I just want to have an idea, I don't think this would translate to any runs.

For the longer run, I can see the P2 wins:

So, it's winning 100% of the times, even against simple with a big branch factor and depth 6.

alejandromarcu added 2 commits January 28, 2026 13:08

Improvements in training

82f294a

improve timer

5933199

alejandromarcu requested review from adamantivm, jonbinney and muralx January 28, 2026 23:28

alejandromarcu merged commit fea75cd into main Jan 28, 2026
1 check passed

alejandromarcu deleted the v2_b branch January 28, 2026 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements in training#329

Improvements in training#329
alejandromarcu merged 2 commits intomainfrom
v2_b

alejandromarcu commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alejandromarcu commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Experiments in Hypercloud

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alejandromarcu commented Jan 28, 2026 •

edited

Loading