SigmaZero

A PyTorch implementation of Supervised and Reinforcement Learning in game-playing according to Alpha Zero

Environment Setup

We will be using Miniconda as the environment manager, but you can adapt the steps for any similar tool you might prefer.

Ensure Miniconda is installed on your system. If not installed, you can download it from Miniconda's official website. This project is developed using Python 3.11, so it is advisable to use a compatible version of Miniconda.

Navigate to the root directory of the project file where you can find environment.yml. This file lists all the necessary packages and their specific versions required to run the application.

Create the Conda environment using the following command:

conda env create -f environment.yml

Activate environment

conda activate sigmazero

If pytorch does not work, try reinstalling it via the following from pytorch

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Setting up Graphical User Interface (GUI)

Environment Setup and Running Application

Run the Apllication:

Start the application using Streamlit by running:

streamlit run Home.py

Troubleshooting

Dependency Errors:

If you encounter errors related to missing packages or version conflicts, ensure that the environment.yml file includes all necessary dependencies with correct versions. Ensure that you are using the right version of streamlit:

pip install streamlit==1.33.0
Environment Activation:

Make sure you activate the correct Conda environment before attempting to run the application. If the environment name is incorrect, check the name specified in the environment.yml file.

Training

Supervised Learning on Vanilla Chess

Download >2000 ELO Player data for Standard Chess (Blitz and Lightning Chess data will contain non-optimal moves) from FICS.

Place the downloaded .pgn.bz2 in the a saves folder and set the file path as well as the number of games to generate in generate_training_supervised.py

python generate_training_supervised.py

Train supervised learning model

python train_supervised.py

Unsupervised Learning

Reinforcement Learning can be run with the following.

python train_RL.py

Hyperparameters for training can be set with the arguments dictionary in the file

args = {
        'C': 2,
        'num_searches': 100,
        'num_iterations': 3,
        'num_selfPlay_iterations': 500,
        'num_epochs': 30,
        'batch_size': 128,
        "start_epoch": 0,
        "chess960": True,
    }

Testing

We test our models against each level of Stockfish in a best of 5 format, results can be found in logs/log.txt. You can load your model weights and the path to your Stockfish engine in eval.py

You can download the model weights here.

python eval.py

Results

Our network was trained on 15000 >2000 ELO Standard Chess data for 60 epochs and the best results are shown in the table below.

The results are obtained from best of 5 games against different levels of Stockfish, a win is awarded 1 point and a draw is awarded 0.5 point. The Sigmazero AI and Stockfish will take turns to play White and Sigmazero will advance to the next level if it gets 2.5 points or more

StockFish Skill Level	Time Limit	Search Depth	Estimated ELO
0	1	5	1376
1	1	5	1462
2	1	5	1547
3	1	5	1596
4	1	5	1718
5	1	5	1804
6	1	5	1993
7	1	5	2012
8	1	6	2127
9	2	7	2270
20	10	50	3100

Vanilla Chess

Model	MCTS Simulations	SF Level	Estimated ELO	Model Win	Model Loss	Model Draw	Games	Score
48k_supervised	800	3	1596	3	1	0	WLWW	3.0
48k_supervised	800	4	1718	3	0	0	WWW	3.0
48k_supervised	800	5	1804	2	0	1	DWW	2.5
48k_supervised	800	6	1993	2	0	1	DWW	2.5
48k_supervised	800	7	2012	3	1	0	WLWW	3.0
48k_supervised	800	8	2127	2	1	1	WLDW	2.5
48k_supervised	800	9	2270	0	3	2	LDLDL	1.0

Chess 960

Coming Soon

Logic

Game Tree Node Attributes

$N_i$, number of times node has been selected / number of times the node has been through the simulation (integer)
$W_i$, the sum of expected value of the node (not an integer, "the number of wins for the node")
$p$, policy values of child nodes
$s$, representation of board state (8x8xN tensor)

Alpha Zero MCTS

Selection: Start from root node (current game state) and select successive nodes based on Upper Confidence Bound Criterion (UCB) until a leaf node L is reached (a leaf node is any node that has a potential child from which no simulation has yet been initiated) or a terminal node. $$\text{UCB} = \frac{w_i}{n_i}+p_ic\frac{\sqrt{N_i}}{1+n_i}$$ , where $c$ is a constant, $p_i$ is the policy of the child node and $n_i$ is its simulation count
Expansion: Unless L ends the game decisively for either player, randomly initialize an unexplored child node.
Backpropagation: Using the value generated by the neural network $f_\theta$, update the N and W values of the current node and all its parent nodes.
Repeat steps 1 to 3 for N iterations

Self-Play and Training

Self-Play until the game ends using MCTS and $f_\theta$
Store the chosen action taken at each state and the values of the node (-1,0,1) depending on the player and whether he won or lost the game. One training sample should contain: (board state s, the action chosen $\pi$, the value of the node z)
Minimize loss function of the training samples in the batch.

$$l = (z-v)^2-\pi^T\log{p}+c||\theta||^2$$

$c$ is a constant

Board State Representation

For the player's perspectives, this is what the tensor will look like. The board will change according to the current player.

White's View:

Black's View:

The board is represented as a (119, 8, 8) tensor, as calculated with MT + L. Where M = 14, T = 8, L = 7.

M represents the number of pieces/planes that are recorded in the board state. In our implementation, we mimicked AlphaZero's implementation of keeping track of all 12 pieces with 2 repetition planes. The order of the planes are as follows:

White Pawns
White Knights
White Bishops
White Rooks
White Queens
White King
Planes 7 to 12 are the same as 1 to 6, but for black pieces.
1-fold repetition plane, a constant binary value
2-fold repetition plane

T represents the number of half-steps that are kept track of. In this case we keep track of 8 half-steps, or 4 full turns. The latest update of the half-step is recorded in the first planes.

L is not time-tracked. It is a constant 7 planes that represents special cases of the board regardless of time. The order is as follows:

Current player's color
Total Moves that have been played to understand depth
White King's castling rights
White Queen's castling rights
Black King's castling rights
Black Queen's castling rights
No progress plane, for 50-move rule

Action Representation

The actions are represented with an 8x8x73 tensor which can be flattened into a 4672 vector. The planes of the tensor represent the location on the board from which the chess piece should be picked up from.

The first 8x7 channels/planes represent the number of squares to move (1 to 7) for the queen/rook/pawn/bishop/king as well as the direction. (Movement of pawn from 7th rank is assumed to be a promotion to queen)
The next 8 channels/planes represent the direction to move the knight
The last 9 channels represent the underpromotion of the pawn to knight, bishop, and rook resp. (through moving one step from the 7th rank or a diagonal capture from the 7th rank).

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
images		images
pages		pages
.gitattributes		.gitattributes
.gitignore		.gitignore
Home.py		Home.py
README.md		README.md
SigmaZero - AI Final Pitch.pdf		SigmaZero - AI Final Pitch.pdf
SigmaZero Report.pdf		SigmaZero Report.pdf
board.svg		board.svg
chess_tensor.py		chess_tensor.py
environment.yml		environment.yml
eval.py		eval.py
game_stats.csv		game_stats.csv
generate_training_supervised.py		generate_training_supervised.py
mcts.py		mcts.py
mctsnode.py		mctsnode.py
network.py		network.py
play.py		play.py
play_game.py		play_game.py
sim.py		sim.py
supervised_model_best.pt		supervised_model_best.pt
temp.svg		temp.svg
test_update.py		test_update.py
train_RL.py		train_RL.py
train_supervised.py		train_supervised.py
trees.py		trees.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SigmaZero

Environment Setup

Setting up Graphical User Interface (GUI)

Environment Setup and Running Application

Troubleshooting

Training

Supervised Learning on Vanilla Chess

Unsupervised Learning

Testing

Results

Vanilla Chess

Chess 960

Logic

Game Tree Node Attributes

Alpha Zero MCTS

Self-Play and Training

Board State Representation

Action Representation

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

DidItWork/Sigma-Zero

Folders and files

Latest commit

History

Repository files navigation

SigmaZero

Environment Setup

Setting up Graphical User Interface (GUI)

Environment Setup and Running Application

Troubleshooting

Training

Supervised Learning on Vanilla Chess

Unsupervised Learning

Testing

Results

Vanilla Chess

Chess 960

Logic

Game Tree Node Attributes

Alpha Zero MCTS

Self-Play and Training

Board State Representation

Action Representation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages