Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cpnota committed Mar 8, 2024
1 parent ceb9928 commit 2ad4062
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 14 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam

## High-Quality Reference Implementations

The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:

![atari40](benchmarks/atari40.png)
![pybullet](benchmarks/pybullet.png)
![atari40](benchmarks/atari_40m.png)
![atari40](benchmarks/mujoco_v4.png)
![pybullet](benchmarks/pybullet_v0.png)

As of today, `all` contains implementations of the following deep RL algorithms:

Expand Down
28 changes: 17 additions & 11 deletions docs/source/guide/benchmark_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Additionally, we use the following agent "bodies":

The results were as follows:

.. image:: ../../../benchmarks/atari40.png
.. image:: ../../../benchmarks/atari_40m.png

For comparison, we look at the results published in the paper, `Rainbow: Combining Improvements in Deep Reinforcement Learning <https://arxiv.org/abs/1710.02298>`_:

Expand All @@ -40,23 +40,29 @@ Our ``dqn`` and ``ddqn`` in particular were better almost across the board.
While there are some minor implementation differences (for example, we use ``Adam`` for most algorithms instead of ``RMSprop``),
our agents achieved very similar behavior to the agents tested by DeepMind.

MuJoCo Benchmark
------------------

`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
The learning rate was decayed over the course of training using cosine annealing.
The results were as follows:

.. image:: ../../../benchmarks/mujoco_v4.png

These results are similar to results found elsewhere, and in some cases better.
However, results can very based on hyperparameter tuning, implementation specifics, and the random seed.

PyBullet Benchmark
------------------

`PyBullet <https://pybullet.org/wordpress/>`_ provides a free alternative to the popular MuJoCo robotics environments.
While MuJoCo requires a license key and can be difficult for independent researchers to afford, PyBullet is free and open.
Additionally, the PyBullet environments are widely considered more challenging, making them a more discriminant test bed.
For these reasons, we chose to benchmark the ``all.presets.continuous`` presets using PyBullet.

Similar to the Atari benchmark, we ran each agent for 10 million timesteps (in this case, timesteps are equal to frames).
We ran each agent for 5 million timesteps (in this case, timesteps are equal to frames).
The learning rate was decayed over the course of training using cosine annealing.
To reduce the variance of the updates, we added an extra time feature to the state (t * 0.001, where t is the current timestep).
The results were as follows:

.. image:: ../../../benchmarks/pybullet.png

PPO was omitted from the plot for Humanoid because it achieved very large negative returns which interfered with the scale of the graph.
Note, however, that our implementation of soft actor-critic (SAC) is able to solve even this difficult environment.
.. image:: ../../../benchmarks/pybullet_v0.png

Because most research papers still use MuJoCo, direct comparisons are difficult to come by.
However, George Sung helpfully benchmarked TD3 and DDPG on several PyBullet environments [here](https://github.com/georgesung/TD3).
Expand Down

0 comments on commit 2ad4062

Please sign in to comment.