update docs

cpnota · cpnota · commit 2ad406283f2c · 2024-03-08T09:13:28.000-05:00
diff --git a/README.md b/README.md
@@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam
 
 ## High-Quality Reference Implementations
 
-The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
+The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
 
-![atari40](benchmarks/atari40.png)
-![pybullet](benchmarks/pybullet.png)
+![atari40](benchmarks/atari_40m.png)
+![atari40](benchmarks/mujoco_v4.png)
+![pybullet](benchmarks/pybullet_v0.png)
 
 As of today, `all` contains implementations of the following deep RL algorithms:
 
diff --git a/docs/source/guide/benchmark_performance.rst b/docs/source/guide/benchmark_performance.rst
@@ -28,7 +28,7 @@ Additionally, we use the following agent "bodies":
 
 The results were as follows:
 
-.. image:: ../../../benchmarks/atari40.png
+.. image:: ../../../benchmarks/atari_40m.png
 
 For comparison, we look at the results published in the paper, `Rainbow: Combining Improvements in Deep Reinforcement Learning <https://arxiv.org/abs/1710.02298>`_:
 
@@ -40,23 +40,29 @@ Our ``dqn`` and ``ddqn`` in particular were better almost across the board.
 While there are some minor implementation differences (for example, we use ``Adam`` for most algorithms instead of ``RMSprop``),
 our agents achieved very similar behavior to the agents tested by DeepMind.
 
+MuJoCo Benchmark
+------------------
+
+`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
+The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
+We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
+The learning rate was decayed over the course of training using cosine annealing.
+The results were as follows:
+
+.. image:: ../../../benchmarks/mujoco_v4.png 
+
+These results are similar to results found elsewhere, and in some cases better.
+However, results can very based on hyperparameter tuning, implementation specifics, and the random seed.
+
 PyBullet Benchmark
 ------------------
 
 `PyBullet <https://pybullet.org/wordpress/>`_ provides a free alternative to the popular MuJoCo robotics environments.
-While MuJoCo requires a license key and can be difficult for independent researchers to afford, PyBullet is free and open.
-Additionally, the PyBullet environments are widely considered more challenging, making them a more discriminant test bed.
-For these reasons, we chose to benchmark the ``all.presets.continuous`` presets using PyBullet.
-
-Similar to the Atari benchmark, we ran each agent for 10 million timesteps (in this case, timesteps are equal to frames).
+We ran each agent for 5 million timesteps (in this case, timesteps are equal to frames).
 The learning rate was decayed over the course of training using cosine annealing.
-To reduce the variance of the updates, we added an extra time feature to the state (t * 0.001, where t is the current timestep).
 The results were as follows:
 
-.. image:: ../../../benchmarks/pybullet.png
-
-PPO was omitted from the plot for Humanoid because it achieved very large negative returns which interfered with the scale of the graph.
-Note, however, that our implementation of soft actor-critic (SAC) is able to solve even this difficult environment.
+.. image:: ../../../benchmarks/pybullet_v0.png
 
 Because most research papers still use MuJoCo, direct comparisons are difficult to come by.
 However, George Sung helpfully benchmarked TD3 and DDPG on several PyBullet environments [here](https://github.com/georgesung/TD3).