AlphaZero Gomoku

An implementation of the AlphaZero algorithm for the game of Gomoku (Five in a Row), featuring self-play reinforcement learning and Monte Carlo Tree Search.

Training Loss Curve over 22 Iterations, last for 4 days on a single RTX 3090 Ti:

MCTS simulations=2000, cpuct=4.0

TODOs:

Refactor this project and improve code quality to serve as nano alphazero
...

Key Features

Complete AlphaZero algorithm implementation with MCTS and policy-value network
Self-play training with experience replay buffer
1cycle learning rate scheduling for stable training
Arena evaluation mechanism for model selection
Support for different board sizes (9x9, 15x15)

Installation

pip install torch numpy tqdm wandb pygame

You can download the best pre-trained model so far from my huggingface repo and put it in the temp folder.

Usage

Play Against Trained Model

python alphazero.py --play \
    --round=2 \
    --player1=human \
    --player2=alphazero \
    --ckpt_file=best.pth.tar \
    --verbose

Train from Scratch

python alphazero.py --train --wandb

Key Parameters

numMCTSSims: Number of MCTS simulations per move (default: 400)
numEps: Number of self-play games per iteration (default: 100)
maxlenOfQueue: Size of replay buffer (default: 200000)
cpuct: Exploration constant in MCTS (default: 1.0)
min_lr/max_lr: Learning rate bounds for 1cycle schedule (1e-4 to 1e-2)
board_size: Board size, directly affects model size & training speed (default: 9)

Project Structure

alphazero.py: Main implementation including MCTS and neural network
game.py: Gomoku game logic & rules, including board state rendering, move generation, and game end detection

Blog & Tutorial

alphazero-from-scratch

Acknowledgments

This project is inspired by and has greatly benefited from the work of schinger/AlphaZero.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
temp		temp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alphazero.py		alphazero.py
config.yaml		config.yaml
game.py		game.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero Gomoku

Key Features

Installation

Usage

Play Against Trained Model

Train from Scratch

Key Parameters

Project Structure

Blog & Tutorial

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlphaZero Gomoku

Key Features

Installation

Usage

Play Against Trained Model

Train from Scratch

Key Parameters

Project Structure

Blog & Tutorial

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages