Skip to content

Improvements in training#329

Merged
alejandromarcu merged 2 commits intomainfrom
v2_b
Jan 28, 2026
Merged

Improvements in training#329
alejandromarcu merged 2 commits intomainfrom
v2_b

Conversation

@alejandromarcu
Copy link
Collaborator

@alejandromarcu alejandromarcu commented Jan 28, 2026

Logging more to wandb:

  • policy, value and total loss
  • time for training, sampling and time waiting for data to train
  • model lag: ie.how many models ago the self playing started. If all were 0, we're basically doing epochs. If we use a lot of workers, the number will increase. E.g. a model lag of 5 means that we started self-play in a process with a model 5 versions older than the model that is current once the self-play finished.

Other:

  • games_per_training_step: this parameter controls the ratio between games played and training steps. The training will wait in order to keep the ratio. If we run in a machine with a lot of processes, training may not be able to catch up. We'll cross that bridge when we get to it.

Experiments in Hypercloud

I used an 8xA6000 to see how it performs the base B5W3 experiment.
I run with different number of workers and parallel games and got this:
image

The runs are:

  • cucu-28a-8xA6000: 240 workers, 8 parallel games
  • cucu-28a-8xA6000-120: 120 workers, 8 parallel games
  • cucu-28a-8xA6000-120-16: 120 workers, 16 parallel games
  • cucu-28a-8xA6000-60-16: 60 workers, 16 parallel games

The one with 240 games took more than 3 minutes to start completing games, and the slope is about the same than the 120 workers, so it seems that there's no advantage on using so many workers. I can see that the GPUs are at 99%,. so that's the bottleneck.
The one that performed the best is with 120 workers and 16 parallel games, but just a bit better than 60 workers. I'll try increasing the number of parallel games anyway. But in any case, I'd expect the setup to be highly dependent on the board size, mcts_n, network config, etc, so I just want to have an idea, I don't think this would translate to any runs.

For the longer run, I can see the P2 wins:
image
So, it's winning 100% of the times, even against simple with a big branch factor and depth 6.

@alejandromarcu alejandromarcu merged commit fea75cd into main Jan 28, 2026
1 check passed
@alejandromarcu alejandromarcu deleted the v2_b branch January 28, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant