You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam
21
21
22
22
## High-Quality Reference Implementations
23
23
24
-
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
24
+
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
25
25
26
-

27
-

26
+

27
+

28
+

28
29
29
30
As of today, `all` contains implementations of the following deep RL algorithms:
Copy file name to clipboardExpand all lines: docs/source/guide/benchmark_performance.rst
+17-11Lines changed: 17 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ Additionally, we use the following agent "bodies":
28
28
29
29
The results were as follows:
30
30
31
-
.. image:: ../../../benchmarks/atari40.png
31
+
.. image:: ../../../benchmarks/atari_40m.png
32
32
33
33
For comparison, we look at the results published in the paper, `Rainbow: Combining Improvements in Deep Reinforcement Learning <https://arxiv.org/abs/1710.02298>`_:
34
34
@@ -40,23 +40,29 @@ Our ``dqn`` and ``ddqn`` in particular were better almost across the board.
40
40
While there are some minor implementation differences (for example, we use ``Adam`` for most algorithms instead of ``RMSprop``),
41
41
our agents achieved very similar behavior to the agents tested by DeepMind.
42
42
43
+
MuJoCo Benchmark
44
+
------------------
45
+
46
+
`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
47
+
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
48
+
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
49
+
The learning rate was decayed over the course of training using cosine annealing.
50
+
The results were as follows:
51
+
52
+
.. image:: ../../../benchmarks/mujoco_v4.png
53
+
54
+
These results are similar to results found elsewhere, and in some cases better.
55
+
However, results can very based on hyperparameter tuning, implementation specifics, and the random seed.
56
+
43
57
PyBullet Benchmark
44
58
------------------
45
59
46
60
`PyBullet <https://pybullet.org/wordpress/>`_ provides a free alternative to the popular MuJoCo robotics environments.
47
-
While MuJoCo requires a license key and can be difficult for independent researchers to afford, PyBullet is free and open.
48
-
Additionally, the PyBullet environments are widely considered more challenging, making them a more discriminant test bed.
49
-
For these reasons, we chose to benchmark the ``all.presets.continuous`` presets using PyBullet.
50
-
51
-
Similar to the Atari benchmark, we ran each agent for 10 million timesteps (in this case, timesteps are equal to frames).
61
+
We ran each agent for 5 million timesteps (in this case, timesteps are equal to frames).
52
62
The learning rate was decayed over the course of training using cosine annealing.
53
-
To reduce the variance of the updates, we added an extra time feature to the state (t * 0.001, where t is the current timestep).
54
63
The results were as follows:
55
64
56
-
.. image:: ../../../benchmarks/pybullet.png
57
-
58
-
PPO was omitted from the plot for Humanoid because it achieved very large negative returns which interfered with the scale of the graph.
59
-
Note, however, that our implementation of soft actor-critic (SAC) is able to solve even this difficult environment.
65
+
.. image:: ../../../benchmarks/pybullet_v0.png
60
66
61
67
Because most research papers still use MuJoCo, direct comparisons are difficult to come by.
62
68
However, George Sung helpfully benchmarked TD3 and DDPG on several PyBullet environments [here](https://github.com/georgesung/TD3).
0 commit comments