Skip to content

Commit 2ad4062

Browse files
committed
update docs
1 parent ceb9928 commit 2ad4062

File tree

2 files changed

+21
-14
lines changed

2 files changed

+21
-14
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam
2121

2222
## High-Quality Reference Implementations
2323

24-
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
24+
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
2525

26-
![atari40](benchmarks/atari40.png)
27-
![pybullet](benchmarks/pybullet.png)
26+
![atari40](benchmarks/atari_40m.png)
27+
![atari40](benchmarks/mujoco_v4.png)
28+
![pybullet](benchmarks/pybullet_v0.png)
2829

2930
As of today, `all` contains implementations of the following deep RL algorithms:
3031

docs/source/guide/benchmark_performance.rst

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Additionally, we use the following agent "bodies":
2828

2929
The results were as follows:
3030

31-
.. image:: ../../../benchmarks/atari40.png
31+
.. image:: ../../../benchmarks/atari_40m.png
3232

3333
For comparison, we look at the results published in the paper, `Rainbow: Combining Improvements in Deep Reinforcement Learning <https://arxiv.org/abs/1710.02298>`_:
3434

@@ -40,23 +40,29 @@ Our ``dqn`` and ``ddqn`` in particular were better almost across the board.
4040
While there are some minor implementation differences (for example, we use ``Adam`` for most algorithms instead of ``RMSprop``),
4141
our agents achieved very similar behavior to the agents tested by DeepMind.
4242

43+
MuJoCo Benchmark
44+
------------------
45+
46+
`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
47+
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
48+
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
49+
The learning rate was decayed over the course of training using cosine annealing.
50+
The results were as follows:
51+
52+
.. image:: ../../../benchmarks/mujoco_v4.png
53+
54+
These results are similar to results found elsewhere, and in some cases better.
55+
However, results can very based on hyperparameter tuning, implementation specifics, and the random seed.
56+
4357
PyBullet Benchmark
4458
------------------
4559

4660
`PyBullet <https://pybullet.org/wordpress/>`_ provides a free alternative to the popular MuJoCo robotics environments.
47-
While MuJoCo requires a license key and can be difficult for independent researchers to afford, PyBullet is free and open.
48-
Additionally, the PyBullet environments are widely considered more challenging, making them a more discriminant test bed.
49-
For these reasons, we chose to benchmark the ``all.presets.continuous`` presets using PyBullet.
50-
51-
Similar to the Atari benchmark, we ran each agent for 10 million timesteps (in this case, timesteps are equal to frames).
61+
We ran each agent for 5 million timesteps (in this case, timesteps are equal to frames).
5262
The learning rate was decayed over the course of training using cosine annealing.
53-
To reduce the variance of the updates, we added an extra time feature to the state (t * 0.001, where t is the current timestep).
5463
The results were as follows:
5564

56-
.. image:: ../../../benchmarks/pybullet.png
57-
58-
PPO was omitted from the plot for Humanoid because it achieved very large negative returns which interfered with the scale of the graph.
59-
Note, however, that our implementation of soft actor-critic (SAC) is able to solve even this difficult environment.
65+
.. image:: ../../../benchmarks/pybullet_v0.png
6066

6167
Because most research papers still use MuJoCo, direct comparisons are difficult to come by.
6268
However, George Sung helpfully benchmarked TD3 and DDPG on several PyBullet environments [here](https://github.com/georgesung/TD3).

0 commit comments

Comments
 (0)