Blackjack, Markov, and Two Dice

There is nothing better than winning over the casino, especially in Blackjack, where you literally play against the dealer. So, we decided to train a Q-Learning agent that can make the dealer feel uncanny.

Installation

Before installing the project, make sure that you have already installed poetry for dependency management.

Also, ensure that your Poetry creates the local virtual environment .venv in your project's folder.

poetry config virtualenvs.in-project true

After configuring Poetry, clone the repository and launch the installation.

git clone [email protected]:Makkarik/dice-blackjack-mdp.git
cd dice-blackjack-mdp
poetry install

Training Environment

The Blackjack game uses a pair of dice instead of cards as a source of randomness. To get familiar with the game rules, you may see the original source at this link.

The Dice Blackjack has a state vector $s = [p, d_1, d_2] \in \mathbb{S}$,

where:

$p \in [0, 27] \subset \mathbb{N}_0$ - the sum of the previous rolls;

$d_1, d_2 \in [0, 6] \subset \mathbb{N}_0$ - the rolled values of the first and second die (0 value is used to indicate final states when no rolling is available).

The action space consists of 6 actions:

0 - hit the first die (H1); 1 - hit the second die (H2); 2 - hit the sum (HΣ);

3 - stack the first die (S1); 4 - stack the second die (S2); 5 - stack the sum (SΣ).

The game ends with one of four possible rewards:

-1 - the player got busted (scored more than 21 points) or got fewer points than the dealer;

0 - the game ended with a tie;

1 - the player won over the dealer or the dealer got busted;

2 - the player rolled a Blackjack combination (2 double values in the first two rolls).

You may play the game by launching the environment file /src/env.py directly.

Training

You may reproduce the results by launching the Training.ipynb notebook.

We have trained the agent for 500,000 episodes with non-linear $\varepsilon$ decay.

It is noticeable that the acquired policy is strong enough to get positive feedback from the game (average reward is greater than 0). To make the policy human-readable, we have converted the Q-table to the Dice Blackjack cheatsheet with all possible cases.

The cells colored in gray designate the states with either no available actions or those that have not been encountered during the training (e.g., the heatmap for the score of 5).

You may use the cheatsheet and play the game manually to check if the policy is good enough.

Code Style

The repository is equipped with pre-commit hooks for automatic code linting. All the code style requirements are listed in the [tool.ruff.lint] section of the pyproject.toml file.

For a better experience, use the VS Code IDE with the installed Ruff extension.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
analysis		analysis
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
Training.ipynb		Training.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blackjack, Markov, and Two Dice

Installation

Training Environment

Training

Code Style

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

Makkarik/dice-blackjack-mdp

Folders and files

Latest commit

History

Repository files navigation

Blackjack, Markov, and Two Dice

Installation

Training Environment

Training

Code Style

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages