Udacity Reacher Project

Udacity Deep Reinforcement Learning Nanodegree Reacher Project

Introduction

This is a solution for the second project of the Udacity deep reinforcement learning course. It includes a script to train an agent using the TD3 algorithm. The models are trained using the Stable Baselines3 project.

Problem description

The agent consists of an arm with two joints and the environment contains a sphere which is rotating around the agent. The goal is to keep touching the ball as long as possible during an episode of 1000 timesteps.

Rewards:
- +0.04 for each timestep the agent touches the sphere
Input state:
- 33 continuous variables corresponding to position, rotation, velocity, and angular velocities of the arm
Actions:
- 4 continuous variables, corresponding to torque applicable to two joints with values in [-1.0, 1.0]
Goal:
- Get an average score of at least +30 over 100 consecutive episodes
Environment:
- The environment that is used is a single agent provided by Udacity.

Solution

The problem is solved with TD3 using the stable baselines framework.

Setup project

Unfortunately, the Unity ML environment used by Udacity for this project is a very early version that is a few years old - v0.4. This makes it extremely difficult to set things up, particularly in a Windows environment. Please see the separate guide I have provided on how to do this: setup project.

Training an agent

I have implemented an experiment based framework that allows for exploration of different hyperparameters when training a model. The parameters that you can specify are these:

learning-rate the learning rate to use for training the model. Default is 0.0003.
batch-size the size of batches that are sampled and used to train the model. Default is 100.
buffer-size the size of the replay buffer. Default is 100,000.
total-timesteps the total number of timesteps to train for. Default is 100,000.
seed random seed to use. Default is -1 which means generate a random value.
environment-port this is the port number used for communication with the Unity environment. If you want to have more than one agent running at the same time you would specify a different port for each of them. Default is 5005.
policy-layers the hidden layers to use in the neural network. Specified as a comma separated list. Default is "400,300".
algorithm the name of the algorithm to use to train the agent. At the moment only td3 is supported and this is the default value.
executable-path the path to the executable to run. Please see the notes about setup project to help with this.
experiments-root the root folder that experiment output will be written to.
experiment-name the name of the experiment. A subfolder will be created to the 'experiments-root' folder with this name and all output from this experiment will be written to it.
reward-threshold the reward value that the solution must attain over 100 consecutive episodes. Default is 30.0.
gamma the gamma value used for greedy strategy. Default is 0.99.

Here is an example command line:

python train_agent.py --experiment-name td3-lr_0_0005-128_128_128_ts_600K --learning-rate 0.0005 --total-timesteps 600000 --policy-layers "128,128,128"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
unityml		unityml
wheels		wheels
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Report.ipynb		Report.ipynb
Results.ipynb		Results.ipynb
Setup.md		Setup.md
TD3_Algorithm.png		TD3_Algorithm.png
TD3_Explanation.ipynb		TD3_Explanation.ipynb
requirements.txt		requirements.txt
td3_checkpoint.pth		td3_checkpoint.pth
td3_summary_scores.png		td3_summary_scores.png
train_agent.py		train_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Udacity Reacher Project

Introduction

Problem description

Solution

Setup project

Training an agent

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ianormy/UdacityReacherProject

Folders and files

Latest commit

History

Repository files navigation

Udacity Reacher Project

Introduction

Problem description

Solution

Setup project

Training an agent

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages