Requirement:

Overview :

Reinforcement Learning

At first, the agent will play random moves, saving the states and the given reward in a limited queue (replay memory). At the end of each episode (game), the agent will train itself (using a neural network) with a random sample of the replay memory. As more and more games are played, the agent becomes smarter, achieving higher and higher scores.

Since in reinforcement learning once an agent discovers a good 'path' it will stick with it, it was also considered an exploration variable (that decreases over time), so that the agent picks sometimes a random action instead of the one it considers the best. This way, it can discover new 'paths' to achieve higher scores.

Training

The training is based on the Q Learning algorithm. Instead of using just the current state and reward obtained to train the network, it is used Q Learning (that considers the transition from the current state to the future one) to find out what is the best possible score of all the given states considering the future rewards, i.e., the algorithm is not greedy. This allows for the agent to take some moves that might not give an immediate reward, so it can get a bigger one later on (e.g. waiting to clear multiple lines instead of a single one).

The neural network will be updated with the given data (considering a play with reward reward that moves from state to next_state, the latter having an expected value of Q_next_state, found using the prediction from the neural network):

if not terminal state (last round): Q_state = reward + discount × Q_next_state else: Q_state = reward

Best Action

Most of the deep Q Learning strategies used output a vector of values for a certain state. Each position of the vector maps to some action (ex: left, right, ...), and the position with the higher value is selected.

However, the strategy implemented was slightly different. For some round of Tetris, the states for all the possible moves will be collected. Each state will be inserted in the neural network, to predict the score obtained. The action whose

Requirement:

tensorflow [probably v2.5]

Usage:

edit "cm.py";

choose mode "human_player", "ai_player_training" and "ai_player_watching"

edit "tetrominoes.py" -> create_pool(cls): -> elif GAME_TYPE == 'extra':

add or delete tetromino.

run "tetris_ai.py".

training may take a significant amount of cpu usage.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
cm.py		cm.py
game.py		game.py
gui.py		gui.py
tetris.py		tetris.py
tetrominoes.py		tetrominoes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview :

Reinforcement Learning

Training

Best Action

Requirement:

Usage:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview :

Reinforcement Learning

Training

Best Action

Requirement:

Usage:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages