Skip to content

Makkarik/dice-blackjack-mdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blackjack, Markov, and Two Dice

Agent inference

There is nothing better than winning over the casino, especially in Blackjack, where you literally play against the dealer. So, we decided to train a Q-Learning agent that can make the dealer feel uncanny.

Installation

Before installing the project, make sure that you have already installed poetry for dependency management.

Also, ensure that your Poetry creates the local virtual environment .venv in your project's folder.

poetry config virtualenvs.in-project true

After configuring Poetry, clone the repository and launch the installation.

git clone [email protected]:Makkarik/dice-blackjack-mdp.git
cd dice-blackjack-mdp
poetry install

Training Environment

The Blackjack game uses a pair of dice instead of cards as a source of randomness. To get familiar with the game rules, you may see the original source at this link.

The Dice Blackjack has a state vector $s = [p, d_1, d_2] \in \mathbb{S}$,

where:

$p \in [0, 27] \subset \mathbb{N}_0$ - the sum of the previous rolls;

$d_1, d_2 \in [0, 6] \subset \mathbb{N}_0$ - the rolled values of the first and second die (0 value is used to indicate final states when no rolling is available).

The action space consists of 6 actions:

0 - hit the first die (H1); 1 - hit the second die (H2); 2 - hit the sum ();

3 - stack the first die (S1); 4 - stack the second die (S2); 5 - stack the sum ().

The game ends with one of four possible rewards:

-1 - the player got busted (scored more than 21 points) or got fewer points than the dealer;

0 - the game ended with a tie;

1 - the player won over the dealer or the dealer got busted;

2 - the player rolled a Blackjack combination (2 double values in the first two rolls).

You may play the game by launching the environment file /src/env.py directly.

Training

You may reproduce the results by launching the Training.ipynb notebook.

We have trained the agent for 500,000 episodes with non-linear $\varepsilon$ decay.

Training statistics

It is noticeable that the acquired policy is strong enough to get positive feedback from the game (average reward is greater than 0). To make the policy human-readable, we have converted the Q-table to the Dice Blackjack cheatsheet with all possible cases.

The cells colored in gray designate the states with either no available actions or those that have not been encountered during the training (e.g., the heatmap for the score of 5).

Policy

You may use the cheatsheet and play the game manually to check if the policy is good enough.

Code Style

The repository is equipped with pre-commit hooks for automatic code linting. All the code style requirements are listed in the [tool.ruff.lint] section of the pyproject.toml file.

For a better experience, use the VS Code IDE with the installed Ruff extension.

About

Win a dealer at the Dice Blackjack with the Q-Learning agent

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •