Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.9.1 - Gymnasium and Mujoco #324

Merged
merged 28 commits into from
Mar 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a56ccc8
change opencv dependency to headless and upgrade to version 4 (#275)
cpnota Oct 3, 2022
9b1e2f9
Merge branch 'master' into develop
cpnota Nov 29, 2023
2052d16
switch to gymnasium and update imports
cpnota Nov 29, 2023
8e88660
update state and state test
cpnota Nov 29, 2023
abeece2
Feature/gymnasium (#278)
cpnota Dec 6, 2023
ca1ba1d
merge develop
cpnota Dec 7, 2023
c390592
Feature/mujoco (#279)
cpnota Dec 8, 2023
b978310
Refactor/scripts-folder (#286)
cpnota Jan 26, 2024
3d4c258
add __call__ method to Builder API and unit tests (#287)
cpnota Jan 26, 2024
07cd33c
Feature/episode length (#289)
cpnota Feb 11, 2024
851a520
add entropy_backups hyperparameter to SAC (#296)
cpnota Feb 11, 2024
5ecb76c
Refactor/formatting (#299)
cpnota Feb 11, 2024
73ac02a
Fix key error warnings (#300)
cpnota Feb 25, 2024
1a2265c
finish docstring for nn aggregation (#301)
cpnota Feb 25, 2024
2ccda12
Bugfix/publish workflow (#303)
cpnota Feb 25, 2024
dbcf5ed
Add save_freq argument and refactor scripts (#305)
cpnota Feb 25, 2024
1119aad
Hyperparameter Logging (#308)
cpnota Feb 27, 2024
dc295ab
remove env name from hparams tag (#309)
cpnota Feb 27, 2024
9d06482
SAC/DDPG tweaks (#312)
cpnota Mar 2, 2024
0871882
fix duplicate env handling (#314)
cpnota Mar 2, 2024
c2d02ed
Upgrade dependencies (#315)
cpnota Mar 4, 2024
a12a828
fix plotter and log final summary at end of training (#320)
cpnota Mar 5, 2024
dec247d
add swig setup dependency and remove unrar/swig from github scripts (…
cpnota Mar 7, 2024
379b72a
Feature/benchmarks (#317)
cpnota Mar 8, 2024
67ca98e
bump read the docs python version number
cpnota Mar 16, 2024
dbb0d96
update readthedocs.yml
cpnota Mar 16, 2024
ac4c444
Update documentation (#323)
cpnota Mar 17, 2024
f8073e5
update version number to 0.9.1
cpnota Mar 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.9]
python-version: [3.8, 3.11]

steps:
- uses: actions/checkout@v2
Expand All @@ -25,9 +25,8 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get install swig
sudo apt-get install unrar
pip install torch~=1.11 --extra-index-url https://download.pytorch.org/whl/cpu
python -m pip install --upgrade pip
pip install torch~=2.0 --extra-index-url https://download.pytorch.org/whl/cpu
make install
- name: Lint code
run: |
Expand Down
35 changes: 18 additions & 17 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,34 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

name: Upload Python Package

on:
release:
types: [created]
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

environment: deployment

environment: publish
permissions:
id-token: write
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v3
with:
python-version: '3.x'
python-version: 3.11
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
pip install torch~=2.0 --extra-index-url https://download.pytorch.org/whl/cpu
pip install setuptools wheel
make install
- name: Build package
run: make build
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
24 changes: 7 additions & 17 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,16 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Build documentation with MkDocs
#mkdocs:
# configuration: mkdocs.yml
build:
os: "ubuntu-22.04"
tools:
python: "3.11"

# Optionally build your docs in additional formats such as PDF and ePub
formats: all

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- docs

sphinx:
configuration: docs/source/conf.py
7 changes: 5 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@ integration-test:
python -m unittest discover -s integration -p "*test.py"

lint:
flake8 --ignore "E501,E731,E74,E402,F401,W503,E128" all
black --check all benchmarks examples integration setup.py
isort --profile black --check all benchmarks examples integration setup.py
flake8 --select "F401" all benchmarks examples integration setup.py

format:
autopep8 --in-place --aggressive --aggressive --ignore "E501,E731,E74,E402,F401,W503,E128" -r all
black all benchmarks examples integration setup.py
isort --profile black all benchmarks examples integration setup.py

tensorboard:
tensorboard --logdir runs
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam

## High-Quality Reference Implementations

The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:

![atari40](benchmarks/atari40.png)
![pybullet](benchmarks/pybullet.png)
![atari40](benchmarks/atari_40m.png)
![atari40](benchmarks/mujoco_v4.png)
![pybullet](benchmarks/pybullet_v0.png)

As of today, `all` contains implementations of the following deep RL algorithms:

Expand Down
34 changes: 12 additions & 22 deletions all/__init__.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,16 @@
import all.agents
import all.approximation
import all.core
import all.environments
import all.logging
import all.memory
import all.nn
import all.optim
import all.policies
import all.presets
from all.core import State, StateArray

__all__ = [
'agents',
'approximation',
'core',
'environments',
'logging',
'memory',
'nn',
'optim',
'policies',
'presets',
'State',
'StateArray'
"agents",
"approximation",
"core",
"environments",
"logging",
"memory",
"nn",
"optim",
"policies",
"presets",
"State",
"StateArray",
]
1 change: 0 additions & 1 deletion all/agents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
from .vqn import VQN, VQNTestAgent
from .vsarsa import VSarsa, VSarsaTestAgent


__all__ = [
# Agent interfaces
"Agent",
Expand Down
1 change: 1 addition & 0 deletions all/agents/_agent.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from abc import ABC, abstractmethod

from all.optim import Schedulable


Expand Down
1 change: 1 addition & 0 deletions all/agents/_multiagent.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from abc import ABC, abstractmethod

from all.optim import Schedulable


Expand Down
1 change: 1 addition & 0 deletions all/agents/_parallel_agent.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from abc import ABC, abstractmethod

from all.optim import Schedulable


Expand Down
31 changes: 17 additions & 14 deletions all/agents/a2c.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import torch
from torch.nn.functional import mse_loss

from all.logging import DummyLogger
from all.memory import NStepAdvantageBuffer

from ._agent import Agent
from ._parallel_agent import ParallelAgent

Expand All @@ -28,15 +29,15 @@ class A2C(ParallelAgent):
"""

def __init__(
self,
features,
v,
policy,
discount_factor=0.99,
entropy_loss_scaling=0.01,
n_envs=None,
n_steps=4,
logger=DummyLogger()
self,
features,
v,
policy,
discount_factor=0.99,
entropy_loss_scaling=0.01,
n_envs=None,
n_steps=4,
logger=DummyLogger(),
):
if n_envs is None:
raise RuntimeError("Must specify n_envs.")
Expand Down Expand Up @@ -80,7 +81,9 @@ def _train(self, next_states):
value_loss = mse_loss(values, targets)
policy_gradient_loss = -(distribution.log_prob(actions) * advantages).mean()
entropy_loss = -distribution.entropy().mean()
policy_loss = policy_gradient_loss + self.entropy_loss_scaling * entropy_loss
policy_loss = (
policy_gradient_loss + self.entropy_loss_scaling * entropy_loss
)
loss = value_loss + policy_loss

# backward pass
Expand All @@ -90,16 +93,16 @@ def _train(self, next_states):
self.features.step()

# record metrics
self.logger.add_info('entropy', -entropy_loss)
self.logger.add_info('normalized_value_error', value_loss / targets.var())
self.logger.add_info("entropy", -entropy_loss)
self.logger.add_info("normalized_value_error", value_loss / targets.var())

def _make_buffer(self):
return NStepAdvantageBuffer(
self.v,
self.features,
self.n_steps,
self.n_envs,
discount_factor=self.discount_factor
discount_factor=self.discount_factor,
)


Expand Down
39 changes: 22 additions & 17 deletions all/agents/c51.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import torch
import numpy as np
import torch

from all.logging import DummyLogger

from ._agent import Agent


Expand All @@ -26,16 +28,16 @@ class C51(Agent):
"""

def __init__(
self,
q_dist,
replay_buffer,
discount_factor=0.99,
eps=1e-5,
exploration=0.02,
minibatch_size=32,
replay_start_size=5000,
update_frequency=1,
logger=DummyLogger(),
self,
q_dist,
replay_buffer,
discount_factor=0.99,
eps=1e-5,
exploration=0.02,
minibatch_size=32,
replay_start_size=5000,
update_frequency=1,
logger=DummyLogger(),
):
# objects
self.q_dist = q_dist
Expand Down Expand Up @@ -81,7 +83,9 @@ def _best_actions(self, probs):
def _train(self):
if self._should_train():
# sample transitions from buffer
states, actions, rewards, next_states, weights = self.replay_buffer.sample(self.minibatch_size)
states, actions, rewards, next_states, weights = self.replay_buffer.sample(
self.minibatch_size
)
# forward pass
dist = self.q_dist(states, actions)
# compute target distribution
Expand All @@ -100,14 +104,15 @@ def _train(self):

def _should_train(self):
self._frames_seen += 1
return self._frames_seen > self.replay_start_size and self._frames_seen % self.update_frequency == 0
return (
self._frames_seen > self.replay_start_size
and self._frames_seen % self.update_frequency == 0
)

def _compute_target_dist(self, states, rewards):
actions = self._best_actions(self.q_dist.no_grad(states))
dist = self.q_dist.target(states, actions)
shifted_atoms = (
rewards.view((-1, 1)) + self.discount_factor * self.q_dist.atoms
)
shifted_atoms = rewards.view((-1, 1)) + self.discount_factor * self.q_dist.atoms
return self.q_dist.project(dist, shifted_atoms)

def _kl(self, dist, target_dist):
Expand All @@ -117,7 +122,7 @@ def _kl(self, dist, target_dist):


class C51TestAgent(Agent):
def __init__(self, q_dist, n_actions, exploration=0.):
def __init__(self, q_dist, n_actions, exploration=0.0):
self.q_dist = q_dist
self.n_actions = n_actions
self.exploration = exploration
Expand Down
Loading
Loading