Blackjack, Markov and two dice

There is nothing better to win over the casino. Especially in Blackjack, where you literally play against the dealer. So, we decided to train the Q-Learning agent, that can make the dealer feel uncanny.

Installation

Before installing the project make sure that you have already installed poetry for the dependencies management.

Also make sure, that your Poetry create the local virtual environment .venv in your project's folder.

poetry config virtualenvs.in-project true

After configuring the Poetry, clone the repository and launch the installation.

git clone [email protected]:Makkarik/dice-blackjack-mdp.git
cd dice-blackjack-mdp
poetry install

Training Environment

The Blackjack game is the game, that use a pair of dice instead of cards as a source of randomness. To get familiar with the game rules, you may see the original source at this link.

The Dice Blackjack has a state vector $s = [p, d_1, d_2] \in \mathbb{S}$,

where:

$p \in [0, 27] \subset \mathbb{N}_0$ - the sum of the previous rolls;

$d_1, d_2 \in [0, 6] \subset \mathbb{N}_0$ - the rolled values of first and second die (0 value is used to indicate final states, when no rolling is available).

The action space consists of 6 actions:

0 - hit the first die (H1); 1 - hit the second die (H2); 2 - hit the sum (HΣ);

3 - stack the first die (S1); 4 - stack the second die (S2); 5 - stack the sum (SΣ).

The game ends with one of four possible rewards:

-1 - the player got busted (scored more than 21 points) or got less points than the dealer; 0 - the game ended with a tie; 1 - the player won over the dealer or dealer got busted; 2 - the player rolled a Blackjack combination (2 double values in first two rolls).

You may play the game by launching the environment file /src/env.py directly.

Training

You may reproduce the results by launching the Training.ipynb notebook.

We have trained the agent for 500.000 episodes with non-linear $\varepsilon$ decay.

It is noticeable, that the acquired policy is strong enough to get the positive feedback from the game (average reward greater than 0). To make the policy human-readable, we have converted the Q-table to the Dice Blackjack cheatsheet with all possible cases.

The cells colored in gray designated the states with either no available actions or not encountered yet (heatmap for score of 5).

You may use the cheatsheet and play the game manually to check, if the policy is good enough or not.

Code style

The repository is equipped with pre-commit hooks for automatic code linting. All the code style requirements are listed in [tool.ruff.lint] section of pyproject.toml file.

For better experience, use the VS Code IDE with the installed Ruff extension.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
analysis		analysis
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
Training.ipynb		Training.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blackjack, Markov and two dice

Installation

Training Environment

Training

Code style

About

Uh oh!

Releases

Packages

Languages

License

aibekakhmetkazy/dice-blackjack-mdp

Folders and files

Latest commit

History

Repository files navigation

Blackjack, Markov and two dice

Installation

Training Environment

Training

Code style

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages