Releases: DLR-RM/stable-baselines3
v2.6.0: New `LogEveryNTimesteps` callback and `has_attr` method, refactored hyperparameter optimization
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
New Features:
- Added
has_attr
method forVecEnv
to check if an attribute exists - Added
LogEveryNTimesteps
callback to dump logs every N timesteps (note: you need to passlog_interval=None
to avoid any interference) - Added Gymnasium v1.1 support
Bug fixes:
SubProcVecEnv
will now exit gracefully (without big traceback) when usingKeyboardInterrupt
SB3-Contrib
- Renamed
_dump_logs()
todump_logs()
- Fixed issues with
SubprocVecEnv
andMaskablePPO
by usingvec_env.has_attr()
(pickling issues, mask function not present)
RL Zoo
- Refactored hyperparameter optimization. The Optuna Journal storage backend is now supported (recommended default) and you can easily load tuned hyperparameter via the new
--trial-id
argument oftrain.py
. - Save the exact command line used to launch a training
- Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the
VecEnv
class use to instantiate the env in theExperimentManager
- Allow to disable auto-logging by passing
--log-interval -2
(useful when logging things manually) - Added Gymnasium v1.1 support
- Fixed use of old HF api in
get_hf_trained_models()
SBX (SB3 + Jax)
- Updated PPO to support
net_arch
, and additional fixes - Fixed entropy coeff wrongly logged for SAC and derivatives.
- Fixed PPO
predict()
for env that were not normalized (action spaces with limits != [-1, 1]) - PPO now logs the standard deviation
Deprecations:
algo._dump_logs()
is deprecated in favor ofalgo.dump_logs()
and will be removed in SB3 v2.7.0
Others:
- Updated black from v24 to v25
- Improved error messages when checking Box space equality (loading
VecNormalize
) - Updated test to reflect how
set_wrapper_attr
should be used now
Documentation:
- Clarify the use of Gym wrappers with
make_vec_env
in the section on Vectorized Environments (@pstahlhofen) - Updated callback doc for
EveryNTimesteps
- Added doc on how to set env attributes via
VecEnv
calls - Added ONNX export example for
MultiInputPolicy
(@darkopetrovic)
New Contributors
- @pstahlhofen made their first contribution in #2079
- @darkopetrovic made their first contribution in #2098
Full Changelog: v2.5.0...v2.6.0
v2.5.0: New algorithm (SimBa in SBX) and NumPy 2.0 support
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
Breaking Changes:
- Increased minimum required version of PyTorch to 2.3.0
- Removed support for Python 3.8
New Features:
- Added support for NumPy v2.0:
VecNormalize
now cast normalized rewards to float32, updated bit flipping env to avoid overflow issues too - Added official support for Python 3.12
SBX (SB3 + Jax)
- Added SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL
- Added support for parameter resets
Others:
- Updated Dockerfile
Documentation:
- Added Decisions and Dragons to resources. (@jmacglashan)
- Updated PyBullet example, now compatible with Gymnasium
- Added link to policies for
policy_kwargs
parameter (@kplers) - Add FootstepNet Envs to the project page (@cgaspard3333)
- Added FRASA to the project page (@MarcDcls)
- Fixed atari example (@chrisgao99)
- Add a note about
Discrete
action spaces withstart!=0
- Update doc for massively parallel simulators (Isaac Lab, Brax, ...)
- Add dm_control example
New Contributors
- @jmacglashan made their first contribution in #2044
- @kplers made their first contribution in #2050
- @MarcDcls made their first contribution in #2059
- @cgaspard3333 made their first contribution in #2058
- @sanowl made their first contribution in #2064
- @chrisgao99 made their first contribution in #2071
Full Changelog: v2.4.0...v2.5.0
Stable-Baselines3 v2.4.1: Fix for `VecVideoRecorder`
Bug Fixes
- Fixed a bug introduced in v2.4.0 where the
VecVideoRecorder
would override videos
Full Changelog: v2.4.0...v2.4.1
Stable-Baselines3 v2.4.0: New algorithm (CrossQ in SB3-Contrib) and Gymnasium v1.0 support
Warning
Stable-Baselines3 (SB3) v2.4.0 will be the last one supporting Python 3.8 (end of life in October 2024)
and PyTorch < 2.3.
We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.3 (compatible with NumPy v2).
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
Note
DQN (and QR-DQN) models saved with SB3 < 2.4.0 will show a warning about truncation of optimizer state when loaded with SB3 >= 2.4.0.
To suppress the warning, simply save the model again.
You can find more info in PR #1963
Breaking Changes:
- Increased minimum required version of Gymnasium to 0.29.1
New Features:
- Added support for
pre_linear_modules
andpost_linear_modules
increate_mlp
(useful for adding normalization layers, like in DroQ or CrossQ) - Enabled np.ndarray logging for TensorBoardOutputFormat as histogram (see GH#1634) (@iwishwasaneagle)
- Updated env checker to warn users when using multi-dim array to define
MultiDiscrete
spaces - Added support for Gymnasium v1.0
Bug Fixes:
- Fixed memory leak when loading learner from storage,
set_parameters()
does not try to load the object data anymore
and only loads the PyTorch parameters (@peteole) - Cast type in compute gae method to avoid error when using torch compile (@amjames)
CallbackList
now sets the.parent
attribute of child callbacks to its own.parent
. (will-maclean)- Fixed error when loading a model that has
net_arch
manually set toNone
(@jak3122) - Set requirement numpy<2.0 until PyTorch is compatible (pytorch/pytorch#107302)
- Updated DQN optimizer input to only include q_network parameters, removing the target_q_network ones (@corentinlger)
- Fixed
test_buffers.py::test_device
which was not actually checking the device of tensors (@rhaps0dy)
SB3-Contrib
- Added
CrossQ
algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen) - Added
BatchRenorm
PyTorch layer used inCrossQ
(@danielpalen) - Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
- Fixed loading QRDQN changes
target_update_interval
(@jak3122)
RL Zoo
- Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results)
SBX (SB3 + Jax)
- Added CNN support for DQN
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3
Others:
- Fixed various typos (@cschindlbeck)
- Remove unnecessary SDE noise resampling in PPO update (@brn-dev)
- Updated PyTorch version on CI to 2.3.1
- Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and
MlpPolicy
- Switched to uv to download packages faster on GitHub CI
- Updated dependencies for read the doc
- Removed unnecessary
copy_obs_dict
method forSubprocVecEnv
, remove the use of ordered dict and renameflatten_obs
tostack_obs
Documentation:
- Updated PPO doc to recommend using CPU with
MlpPolicy
- Clarified documentation about planned features and citing software
- Added a note about the fact we are optimizing log of ent coeff for SAC
New Contributors
- @amjames made their first contribution in #1922
- @cschindlbeck made their first contribution in #1926
- @peteole made their first contribution in #1908
- @jak3122 made their first contribution in #1937
- @will-maclean made their first contribution in #1939
- @brn-dev made their first contribution in #1933
- @chsahit made their first contribution in #1962
- @Dev1nW made their first contribution in #2017
Full Changelog: v2.3.2...v2.4.0
Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13
Bug fixes
- Reverted
torch.load()
to be calledweights_only=False
as it caused loading issue with old version of PyTorch. #1913 - Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in #1901
Documentation
- Fix typo in changelog by @araffin in #1882
- Fixed broken link in ppo.rst by @chaitanyabisht in #1884
- Adding ER-MRL to community project by @corentinlger in #1904
- Fix tensorboad video slow numpy->torch conversion by @NickLucche in #1910
New Contributors
- @chaitanyabisht made their first contribution in #1884
- @markscsmith made their first contribution in #1901
- @NickLucche made their first contribution in #1910
Full Changelog: v2.3.0...v2.3.2
Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN
Warning
Because of weights_only=True
, this release breaks loading of policies when using PyTorch 1.13.
Please upgrade to PyTorch >= 2.0 or upgrade SB3 version (we reverted the change in SB3 2.3.2)
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- The defaults hyperparameters of
TD3
andDDPG
have been changed to be more consistent withSAC
# SB3 < 2.3.0 default hyperparameters
# model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100)
# SB3 >= 2.3.0:
model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256)
Note
Two inconsistencies remain: the default network architecture for TD3/DDPG
is [400, 300]
instead of [256, 256]
for SAC (for backward compatibility reasons, see report on the influence of the network size ) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see W&B report on the influence of the lr )
- The default
learning_starts
parameter ofDQN
have been changed to be consistent with the other offpolicy algorithms
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = DQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = DQN("MlpPolicy", env, learning_starts=100)
- For safety,
torch.load()
is now called withweights_only=True
when loading torch tensors,
policyload()
still usesweights_only=False
as gymnasium imports are required for it to work - When using
huggingface_sb3
, you will now need to setTRUST_REMOTE_CODE=True
when downloading models from the hub, aspickle.load
is not safe.
New Features:
- Log success rate
rollout/success_rate
when available for on policy algorithms (@corentinlger)
Bug Fixes:
- Fixed
monitor_wrapper
argument that was not passed to the parent class, and dones argument that wasn't passed to_update_into_buffer
(@corentinlger)
SB3-Contrib
- Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to MaskablePPO - Fixed
train_freq
type annotation for tqc and qrdqn (@Armandpl) - Fixed
sb3_contrib/common/maskable/*.py
type annotations - Fixed
sb3_contrib/ppo_mask/ppo_mask.py
type annotations - Fixed
sb3_contrib/common/vec_env/async_eval.py
type annotations - Add some additional notes about
MaskablePPO
(evaluation and multi-process) (@icheered)
RL Zoo
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Added test dependencies to
setup.py
(@power-edge) - Simplify dependencies of
requirements.txt
(remove duplicates fromsetup.py
)
SBX (SB3 + Jax)
- Added support for
MultiDiscrete
andMultiBinary
action spaces to PPO - Added support for large values for gradient_steps to SAC, TD3, and TQC
- Fix
train()
signature and update type hints - Fix replay buffer device at load time
- Added flatten layer
- Added
CrossQ
Others:
- Updated black from v23 to v24
- Updated ruff to >= v0.3.1
- Updated env checker for (multi)discrete spaces with non-zero start.
Documentation:
- Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)
- Updated callback code example
- Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!
- Added video link to "Practical Tips for Reliable Reinforcement Learning" video
- Added
render_mode="human"
in the README example (@marekm4) - Fixed docstring signature for sum_independent_dims (@StagOverflow)
- Updated docstring description for
log_interval
in the base class (@rushitnshah).
Full Changelog: v2.2.1...v2.3.0
Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Note
Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751.
Please use SB3 v2.2.1 and not v2.2.0.
Breaking Changes:
- Switched to
ruff
for sorting imports (isort is no longer needed), black and ruff version now require a minimum version - Dropped
x is False
in favor ofnot x
, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)
New Features:
- Improved error message of the
env_checker
for env wrongly detected as GoalEnv (compute_reward()
is defined) - Improved error message when mixing Gym API with VecEnv API (see GH#1694)
- Add support for setting
options
at reset with VecEnv via theset_options()
method. Same as seeds logic, options are reset at the end of an episode (@ReHoss) - Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to on-policy algorithms (A2C and PPO)
Bug Fixes:
- Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
- Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
- Moves VectorizedActionNoise into
_setup_learn()
in OffPolicyAlgorithm (@PatrickHelm) - Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
- Calls
callback.update_locals()
beforecallback.on_rollout_end()
in OnPolicyAlgorithm (@PatrickHelm) - Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
- Fixed
render_mode
which was not properly loaded when usingVecNormalize.load()
- Fixed success reward dtype in
SimpleMultiObsEnv
(@NixGD) - Fixed check_env for Sequence observation space (@corentinlger)
- Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
- Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
the behavior stay the same for tempfiles (they need to be closed manually),
the behavior is now consistent when loading/saving replay buffer
SB3-Contrib
- Added
set_options
forAsyncEval
- Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to TRPO
RL Zoo
- Removed
gym
dependency, the package is still required for some pretrained agents. - Added
--eval-env-kwargs
totrain.py
(@Quentin18) - Added
ppo_lstm
to hyperparams_opt.py (@technocrat13) - Upgraded to
pybullet_envs_gymnasium>=0.4.0
- Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
- Updated docker image, removed support for X server
- Replaced deprecated
optuna.suggest_uniform(...)
byoptuna.suggest_float(..., low=..., high=...)
SBX (SB3 + Jax)
- Added
DDPG
andTD3
algorithms
Others:
- Fixed
stable_baselines3/common/callbacks.py
type hints - Fixed
stable_baselines3/common/utils.py
type hints - Fixed
stable_baselines3/common/vec_envs/vec_transpose.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_video_recorder.py
type hints - Fixed
stable_baselines3/common/save_util.py
type hints - Updated docker images to Ubuntu Jammy using micromamba 1.5
- Fixed
stable_baselines3/common/buffers.py
type hints - Fixed
stable_baselines3/her/her_replay_buffer.py
type hints - Buffers do no call an additional
.copy()
when storing new transitions - Fixed
ActorCriticPolicy.extract_features()
signature by adding an optionalfeatures_extractor
argument - Update dependencies (accept newer Shimmy/Sphinx version and remove
sphinx_autodoc_typehints
) - Fixed
stable_baselines3/common/off_policy_algorithm.py
type hints - Fixed
stable_baselines3/common/distributions.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_normalize.py
type hints - Fixed
stable_baselines3/common/vec_env/__init__.py
type hints - Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
- Fixed
stable_baselines3/common/policies.py
type hints - Switched to
mypy
only for checking types - Added tests to check consistency when saving/loading files
Documentation:
- Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
- Fixed various typos and grammar mistakes
Full changelog: v2.1.0...v2.2.1
Stable-Baselines3 v2.1.0: Float64 actions, Gymnasium 0.29 support and bug fixes
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13
New Features:
- Added Python 3.11 support
- Added Gymnasium 0.29 support (@pseudo-rnd-thoughts)
SB3-Contrib
- Fixed MaskablePPO ignoring
stats_window_size
argument - Added Python 3.11 support
RL Zoo
- Upgraded to Huggingface-SB3 >= 2.3
- Added Python 3.11 support
Bug Fixes:
- Relaxed check in logger, that was causing issue on Windows with colorama
- Fixed off-policy algorithms with continuous float64 actions (see #1145) (@tobirohrer)
- Fixed
env_checker.py
warning messages for out of bounds in complex observation spaces (@Gabo-Tor)
Others:
- Updated GitHub issue templates
- Fix typo in gym patch error message (@lukashass)
- Refactor
test_spaces.py
tests
Documentation:
- Fixed callback example (@BertrandDecoster)
- Fixed policy network example (@kyle-he)
- Added mobile-env as new community project (@stefanbschneider)
- Added DeepNetSlice to community projects (@AlexPasqua)
Full Changelog: v2.0.0...v2.1.0
Stable-Baselines3 v2.0.0: Gymnasium Support
Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the
shimmy
package (@carlosluis, @arjun-kg, @tlpss) - The deprecated
online_sampling
argument ofHerReplayBuffer
was removed - Removed deprecated
stack_observation_space
method ofStackedObservations
- Renamed environment output observations in
evaluate_policy
to prevent shadowing the input observations during callbacks (@npit) - Upgraded wrappers and custom environment to Gymnasium
- Refined the
HumanOutputFormat
file check: now it verifies if the object is an instance ofio.TextIOBase
instead of only checking for the presence of awrite
method. - Because of new Gym API (0.26+), the random seed passed to
vec_env.seed(seed=seed)
will only be effective after thenenv.reset()
call.
New Features:
- Added Gymnasium support (Gym 0.21 and 0.26 are supported via the
shimmy
package)
SB3-Contrib
- Fixed QRDQN update interval for multi envs
RL Zoo
- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed
CarRacing-v1
toCarRacing-v2
in hyperparameters - Huggingface push to hub now accepts a
--n-timesteps
argument to adjust the length of the video - Fixed
record_video
steps (before it was stepping in a closed env) - Dropped Gym 0.21 support
Bug Fixes:
- Fixed
VecExtractDictObs
does not handle terminal observation (@WeberSamuel) - Set NumPy version to
>=1.20
due to use ofnumpy.typing
(@troiganto) - Fixed loading DQN changes
target_update_interval
(@tobirohrer) - Fixed env checker to properly reset the env before calling
step()
when checking
forInf
andNaN
(@lutogniew) - Fixed HER
truncate_last_trajectory()
(@lbergmann1) - Fixed HER desired and achieved goal order in reward computation (@JonathanKuelz)
Others:
- Fixed
stable_baselines3/a2c/*.py
type hints - Fixed
stable_baselines3/ppo/*.py
type hints - Fixed
stable_baselines3/sac/*.py
type hints - Fixed
stable_baselines3/td3/*.py
type hints - Fixed
stable_baselines3/common/base_class.py
type hints - Fixed
stable_baselines3/common/logger.py
type hints - Fixed
stable_baselines3/common/envs/*.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py
type hints - Fixed
stable_baselines3/common/vec_env/base_vec_env.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_frame_stack.py
type hints - Fixed
stable_baselines3/common/vec_env/dummy_vec_env.py
type hints - Fixed
stable_baselines3/common/vec_env/subproc_vec_env.py
type hints - Upgraded docker images to use mamba/micromamba and CUDA 11.7
- Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
- Improve type annotation of wrappers
- Tests envs are now checked too
- Added render test for
VecEnv
andVecEnvWrapper
- Update issue templates and env info saved with the model
- Changed
seed()
method return type fromList
toSequence
- Updated env checker doc and requirements for tuple spaces/goal envs
Documentation:
- Added Deep RL Course link to the Deep RL Resources page
- Added documentation about
VecEnv
API vs Gym API - Upgraded tutorials to Gymnasium API
- Make it more explicit when using
VecEnv
vs Gym env - Added UAV_Navigation_DRL_AirSim to the project page (@heleidsn)
- Added
EvalCallback
example (@sidney-tio) - Update custom env documentation
- Added
pink-noise-rl
to projects page - Fix custom policy example,
ortho_init
was ignored - Added SBX page
Full Changelog: v1.8.0...v2.0.0
Stable-Baselines3 v1.8.0: Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed shared layers in
mlp_extractor
(@AlexPasqua) - Refactored
StackedObservations
(it now handles dict obs,StackedDictObservations
was removed) - You must now explicitely pass a
features_extractor
parameter when callingextract_features()
- Dropped offline sampling for
HerReplayBuffer
- As
HerReplayBuffer
was refactored to support multiprocessing, previous replay buffer are incompatible with this new version HerReplayBuffer
doesn't require amax_episode_length
anymore
New Features:
- Added
repeat_action_probability
argument inAtariWrapper
. - Only use
NoopResetEnv
andMaxAndSkipEnv
when needed inAtariWrapper
- Added support for dict/tuple observations spaces for
VecCheckNan
, the check is now active in theenv_checker()
(@DavyMorgan) - Added multiprocessing support for
HerReplayBuffer
HerReplayBuffer
now supports all datatypes supported byReplayBuffer
- Provide more helpful failure messages when validating the
observation_space
of custom gym environments usingcheck_env
(@FieteO) - Added
stats_window_size
argument to control smoothing in rollout logging (@jonasreiher)
SB3-Contrib
- Added warning about potential crashes caused by
check_env
in theMaskablePPO
docs (@AlexPasqua) - Fixed
sb3_contrib/qrdqn/*.py
type hints - Removed shared layers in
mlp_extractor
(@AlexPasqua)
RL Zoo
- Open RL Benchmark
- Upgraded to new HerReplayBuffer implementation that supports multiple envs
- Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Removed use_auth_token for push to hub util
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
- Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)
- Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
- Switched to ruff and pyproject.toml
- Removed online_sampling and max_episode_length argument when using HerReplayBuffer
Bug Fixes:
- Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
- Added the argument
dtype
(default tofloat32
) to the noise for consistency with gym action (@sidney-tio) - Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
- Fixed loading of normalized image-based environments
- Fixed
DictRolloutBuffer.add
with multidimensional action space (@younik)
Deprecations:
Others:
- Fixed
tests/test_tensorboard.py
type hint - Fixed
tests/test_vec_normalize.py
type hint - Fixed
stable_baselines3/common/monitor.py
type hint - Added tests for StackedObservations
- Removed Gitlab CI file
- Moved from
setup.cg
topyproject.toml
configuration file - Switched from
flake8
toruff
- Upgraded AutoROM to latest version
- Fixed
stable_baselines3/dqn/*.py
type hints - Added
extra_no_roms
option for package installation without Atari Roms
Documentation:
- Renamed
load_parameters
toset_parameters
(@DavyMorgan) - Clarified documentation about subproc multiprocessing for A2C (@Bonifatius94)
- Fixed typo in
A2C
docstring (@AlexPasqua) - Renamed timesteps to episodes for
log_interval
description (@theSquaredError) - Removed note about gif creation for Atari games (@harveybellini)
- Added information about default network architecture
- Update information about Gymnasium support