Policy Gradient Flappy Bird

Policy Gradient based Deep Reinforcement Learning agent for Flappy Bird, based upon sourabhv's FlapPyBird implementation.

Requirements

Python 3, with packages:
- NumPy
- Multiprocessing
- Matplotlib
- PyGame
- PyTorch

File Description

controller.py - This is the file that is responsible for training the agent, and contains all of the meaty code that's worth looking at (which is also heavily commented).
demo.py - This file shows by default 10 episodes using the pre-trained model model/trained-model.pt.
flappy.py - This file is responsible for running and displaying the game.

Flappy Bird Modifications

In order to make use of sourabhv's repository for a RL agent, I first had to modify the files so that the input would no longer be given by button presses, but could be instead fed directly from an agent. To do this, I created a input and output queues using multiprocessing.Queue(), and replaced key presses in flappy.py with commands taken from the input queue. Once the action was completed, the new state, reward, and done status is returned to the agent via the output queue.

In addition to the above, for training purposes I replaced the background and ground by black textures, removed random pipe and bird colours, and removed all sounds. Also for training purposes I removed the FPS limitation for training mode, leaving the FPS as 30 for test mode for viewability.

I also fixed a bug where flying too high up allowed the user to score infinite points with ease (thanks to my agent finding it through exploration).

Rewards

I've currently defined the rewards returned by flappy.py like so:

+1 for scoring a point (passing through a set of pipes)
-1 for dying

These were chosen kind of arbitrarily, and it'd be interesting to see how modifying these affects training.

Pre-trained Model

Training can be performed using the controller.py file, however a pre-trained model is also provided for convenience.

The pre-trained model, model/trained-model.pt, was trained using a 4-layer network (all fully-connected):

72*100 x 300, with ReLU activation
300 x 300, with ReLU activation
300 x 300, with ReLU activation
300 x 1, with sigmoid activation

The model was trained for approx. 7,500 episodes before stopping due to time constraints, using the Adam optimiser with a learning rate of 1e-4, a discount factor of gamma = 0.99, a batch size of 25, and negative log loss. The model with the best median batch performance was selected.

The state is retrieved from PyGame as a 288 x 512 matrix, before being chopped to a 288 x 400 matrix to remove the ground below the pipes, and then downsampled to a quarter the size, giving 72 x 100.

The output is the probability of flapping given the input state.

Here's the graph of survival time vs number of episodes during training, with logarithmic scale in y, and the orange line indicating the per-batch median score:

We can see from this figure that the first hurdle our agent had to overcome was surviving for more than 100 time steps, where the first pipe appears. After it overcomes this initial hurdle, it gradually starts to survive for longer and longer, with the best performing model can survive for on average approximately 500 time steps, corresponding to approximately 10 points.

It's clear from the above plot that the model has yet to converge to a final state, and so with more training the performance will almost undoubtably improve.

Demo

A demo can be run using the pre-trained model by running the demo.py file. By default the demo will run 10 episodes.

Here's an example of how one episode might look:

An example of how to load model/trained-model.pt can also be seen in this file.

Future Work

In future I'd like to:

Experiment with different gamma values
Experiment with different network parameters
Experiment with different rewards
Try with higher difficulty (less distance between pipes)
See how using background & ground sprites affects learning
See how using random bird & pipe colours affects learning
Try using other architectures, such as convnets

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
assets		assets
model		model
LICENSE		LICENSE
README.md		README.md
controller.py		controller.py
demo.py		demo.py
example.gif		example.gif
flappy.ico		flappy.ico
flappy.py		flappy.py
screenshot1.png		screenshot1.png
training-graph.png		training-graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Gradient Flappy Bird

Requirements

File Description

Flappy Bird Modifications

Rewards

Pre-trained Model

Demo

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Policy Gradient Flappy Bird

Requirements

File Description

Flappy Bird Modifications

Rewards

Pre-trained Model

Demo

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages