Blackjack: Reinforcement Learning Approach

Overview

This repository contains the class of an agent whom is trained in a simplified Blackjack environment. This environment is described by the dealing class. The training of the agent is conducted using three different policy iteration approaches:

Q-Learning (QL)
State action reward state action (SARSA)
Tempoal Difference (TD)

The results of training for 10000 games, with a split of 5000 games of exploration and 5000 of exploitation, over each policy iteration method is presented below.

Averaging over each policy search method and multiples of the standard deck trained upon, the agent optimised their decision process to achieve a win, draw and loss rate of 48.37, 33.21 and 20.91 percent respectively in terms of dealer-agent interactions.

Environment Details

The game play is simplified in the sense that the dealer will not attempt to draw any more cards after the initial two at the start of each round. The agent is then trained to achieved the highest in round score, $\xi$ , is determined as
$\xi = \big( \sum c_{i} \big)^2$ .
If the agent's collective hand value,
$\sum c_{i}$ ,
where the $c_{i}$ 's represent the numerical value of the agent's in hand cards is greater than the dealers. Otherwise, the in round score was set to
$\xi = 0$ .
The dynamics of the game play are as follows: Two cards a dealt from a pre-specified multiple of the standard deck of 52 cards to the agent and passive dealer. Then the agent is able to choose from the action space
$\mathbb{A}(s)$ = $\{Hit,Stick\}$ .
The agent is encouraged to maximise the over game total score defined as
$\Xi = \sum\limits_{\substack{i}}^{S} \xi_{i}$$ ,
where S is the total number of times the agent submitted to stick.

The figure below shows the agent's average achieved over game quadratic score, $\Xi$ , for the different deck multiples and policy iteration methods.

File Details

agent.py:
This script contains a class that describes agent's decision process, i.e. how it chooses it's action and the updating of the Q values for a given policy search method.

dealing.py:
This script contain the class of the Blackjack game play. The actions taken by the agent interact with an instance of this class in a self-perpetuating manner; you only need to call the newRound method to begin the game play. All other aspects of the game play; score, ace count etc, will be determined from the agent's actions and the class attributes update as a consequence.

training.py:
This script contains a function that trains the agent for a given deck size, method, exploration and exploitation period. A count of wins, losses and draws for the exploration and exploitation period is kept and returned.

trainingWithAllMethods.py:

This script trains the agent for ten different regimes, 1-10 decks, for each policy search method. The Q-table for these are then saved as csv files.

validating.py:
This script allows one to reduce the parameter space and request the agent plays the dealer given the optimal policy found from the corresponding QTables. Please make sure that the compressed QTables file is extracted and placed in the working directory when trying to run this script.

QTables.zip:
This compressed file contains the QTables for the various regimes the agent was trained on and can be downloaded here . Please unzip this file and place it in the working directory when attempted to run the validating.py script.

Further Development and Reading

The next task is to separate the dealer from the dealing class into its own class. This will allow implementing dealer actions. Furthermore, there is a clear necessity to find ways to reduce the Q Tables. For a formal review of this project please find my ResearchGate account.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
agent.py		agent.py
dealing.py		dealing.py
optimalScore1to10games.png		optimalScore1to10games.png
training.py		training.py
trainingWIthAllMethods.py		trainingWIthAllMethods.py
validating.py		validating.py
winLossDraw.png		winLossDraw.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blackjack: Reinforcement Learning Approach

Overview

Environment Details

File Details

Further Development and Reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Blackjack: Reinforcement Learning Approach

Overview

Environment Details

File Details

Further Development and Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages