TorchRL v0.13.0

TorchRL 0.13.0 is a broad release focused on recurrent RL throughput, MuJoCo-native environments, macro-control workflows, multi-agent training utilities, and release-aligned cleanup of previously warned deprecations. It also introduces optional Linux CUDA wheels for users who want CUDA-based prioritized replay-buffer kernels, while keeping the standard PyPI wheel as the default for CPU prioritized replay buffers and users who do not need prioritized replay. The release refreshes compatibility with optional and older dependency stacks used by TorchRL's wider environment coverage.

Merged PRs included in v0.13.0

This release includes the following merged PRs in the v0.12.0..v0.13.0 first-parent range:

Release, documentation, and CI

#3817 — Documented optional CUDA TorchRL wheels for CUDA-based prioritized replay-buffer kernels while keeping the standard PyPI wheel as the default install.
#3814 — Fixed release-CI dependency checks, including stable TensorDict selection, TorchCodec CPU wheels, and mixed-device spec validation.
#3813 — Refreshed the README for the 0.13 release.
#3805 — Enacted the v0.13 deprecations and behavior changes.
#3796 — Added collector internals documentation.
#3803 — Bumped TorchRL version metadata to 0.13.0.
#3787 — Added uv local development setup.
#3747 — Split large test files into per-concept files.
#3741 — Added automatic insertion of new release versions into gh-pages versions.html.
#3739 — Fixed olddeps / opt-deps / gym smoke tests broken by #3738 and #3704.
#3710 — Bumped the olddeps TensorDict default to 0.12.
#3705 — Fixed benchmark workflows.
#3706 — Moved stdlib and TorchRL local imports in tests to module top.
#3696 — Added NestedKey and Literal contribution guidance.
#3686 — Added repository contribution rules for AI agents.

Recurrent RL, value estimation, and world models

#3816 — Added RNN reset rollout benchmarks for TorchRL LSTMModule/GRUModule, covering intermediate resets across cuDNN, scan, and Triton backends with eager and compiled runs.
#3815 — Kept GRU scan split sizes concrete for compiled recurrent rollouts while preserving old/optional dependency compatibility.
#3780 — Added a dynamic value-estimator registry across loss modules.
#3807 — Gated scan RNN backward support for compatible environments.
#3792 — Added the recurrent state lifecycle guide.
#3793 — Added recurrent integration coverage.
#3797 — Added chunked TensorDict support for value estimators.
#3785 — Fixed the Triton RNN kernel.
#3784 — Simplified the shifted value-estimator budget.
#3782 — Added the compact-drop shifted value backend.
#3744 — Sanitized NaN next observations in value-estimator forwards.
#3738 — Added the Triton backend for GRU/LSTM with intermediate resets.
#3712 — Fixed LSTMModule padding.
#3707 — Stabilized the RSSMPosteriorV3 gradient test.
#3695 — Improved sequence-RL composability.
#3621 — Implemented DreamerV3 world-model objectives and components.

Environments, transforms, and examples

#3781 — Surfaced Isaac Lab per-index reset and reset_to via IsaacLabWrapper.
#3811 — Fixed MuJoCo macro shapes and Gym Atari setup.
#3779 — Fixed a hidden multiprocessing import in the coding PPO tutorial.
#3806 — Added the MuJoCo macro tutorial environment.
#3802 — Added satellite MuJoCo SAC examples.
#3800 — Added IsaacLab headless rendered eval.
#3700 — Added MuJoCo custom environments with selectable physics backends.
#3801 — Fixed unused frame_skip in PPO tutorial.
#3777 — Added the NextObservationDelta environment transform.
#3766 — Added the FlattenAction transform.
#3765 — Added the ActionScaling transform.
#3727 — Fixed KeyError in PettingZoo action mask with ParallelEnv and done_on_any=False.
#3743 — Added the NextStateReconstructor replay-buffer transform.
#3742 — Added compact_obs support to DataCollector.
#3698 — Fixed GymEnv reward and action shapes across num-env configurations.
#3689 — Added Safety-Gymnasium environment wrappers.
#3682 — Fixed PettingZoo state handling and added an encoding regression test.
#3676 — Added the ExpandAs transform.

Collectors, replay buffers, and performance

#3810 — Made collector weight synchronization idempotent.
#3734 — Added HERReplayBuffer and HindsightStrategy to torchrl.data.
#3749 — Dispatched RandomSampler and SliceSampler to without-replacement variants via replacement=False.
#3729 — Auto-created inner SharedMem schemes for Ray/RPC and policy_factory.
#3728 — Made per-worker SharedMem schemes opt-in for policy_factory tests.
#3714 — Added async prioritized replay-buffer writes.
#3680 — Gated the profiling decorator on TORCHRL_PROFILING.
#3685 — Improved CUDA prioritized replay-buffer ergonomics.
#3672 — Added data-collector hooks.
#3677 — Added CUDA support for prioritized replay sampling, available through the optional CUDA wheel builds.

Objectives, trainers, and multi-agent learning

#3773 — Added CrossGroupCritic.
#3748 — Added MAPPOLoss, IPPOLoss, MultiAgentGAE, and ValueNorm.
#3750 — Fixed the cross-entropy reduction parameter in discrete objectives.
#3694 — Added QMix, VDN, and IQL support to the DQN trainer.
#3699 — Improved Hydra config parity for environments, losses, and loggers.
#3692 — Added an early-stopping trainer hook.
#3693 — Audited TrainerConfig/Trainer parity and added auto_log_optim_steps plumbing.
#3683 — Made DDPG, PPO, and SAC trainers multi-agent-friendly.
#3691 — Added trainer hook configs.
#3639 — Added ACTModel and ACTLoss for robot learning.
#3667 — Added behavior-cloning loss.
#3679 — Fixed PPOTrainer gamma/lambda defaults, removed dead code, and removed a wildcard import.

Compatibility, bug fixes, and cleanup

#3809 — Fixed GRPO runtime issues for vLLM and SGLang backends.
#3812 — Fixed Robohive, Gym, PettingZoo, and setup-test CI failures.
#3808 — Fixed LLM assistant action masks.
#3799 — Fixed old- and optional-dependency workflows.
#3709 — Updated the REINFORCE value-net test for new torch autograd error wording.
#3704 — Forwarded generator kwargs through ProbabilisticActor.
#3688 — Added setup and shutdown hook points.
#3684 — Split environment transforms into per-category modules.

Highlights

Faster recurrent RL and clearer recurrent state handling

Added Triton and scan recurrent backends for GRU/LSTM reset handling, including fixes for recurrent matmul kernels and backward support.
Expanded recurrent integration coverage and documentation for hidden-state lifecycle management.
Added RNN reset rollout benchmarks for LSTMModule and GRUModule across cuDNN, scan, and Triton backends, including compiled TensorDict-input runs.
Improved compact and shifted value-estimator flows, including chunked forwards, cleaner sequence layouts, and a dynamic value-estimator registry across loss modules.

MuJoCo environments and macro-control policies

Added native custom MuJoCo environments with selectable physics backends, including MujocoEnv, locomotion tasks, SatelliteEnv, and CubeBowlEnv.
Added satellite MuJoCo SAC examples and a macro-control tutorial environment for cube-to-bowl manipulation.
Introduced macro-action helpers and transforms for low-frequency semantic control over multi-step low-level action sequences.

Multi-agent, imitation, and model-based objectives

Added MAPPO/IPPO losses, MultiAgentGAE, and value-normalization utilities for multi-agent training.
Added behavior-cloning and action-chunking losses/modules through BCLoss, ACTLoss, and ACTModel.
Added DreamerV3 losses, RSSM V3 components, and world-model utility functions.

Replay buffers, collectors, and action transforms

Added HER replay-buffer support via HERReplayBuffer and HindsightStrategy.
Added action-space transforms such as FlattenAction, ActionScaling, ExpandAs, and NextObservationDelta.
Improved collector and replay-buffer performance with async prioritized writes, ordered read/write APIs, optional trajectory IDs, compact observations, safer policy-version handling, CUDA prioritized replay sampling, and optional CUDA wheels for those CUDA kernels.

CUDA prioritized replay-buffer sampling. On the TorchRL CI benchmark below (batch size 65,536, sampling from 1M and 10M priorities), the CUDA implementation reduces mean sampling latency from 9.99 ms to 0.45 ms at 1M priorities and from 18.68 ms to 0.93 ms at 10M priorities — approximately 22.3× and 20.2× faster than the CPU path. The CUDA wheel is therefore most useful for GPU-heavy prioritized replay workloads; users who do not use prioritized replay, or who use CPU prioritized replay, can keep the standard install.

Broader environment and training coverage

Added Safety-Gymnasium wrappers and additional Isaac Lab reset/evaluation support.
Improved trainer hooks, Hydra config parity, trainer hook configs, and multi-agent trainer ergonomics.
Strengthened old Gym, Atari, PettingZoo, Robohive, optional-dependency, vLLM, SGLang, and setup CI coverage.

Installation note: optional CUDA wheels

The default install remains:

pip install torchrl

Use this standard PyPI wheel if you do not use prioritized replay buffers, or if your prioritized replay buffers run on CPU. For users who want the CUDA-based prioritized replay-buffer implementations introduced in this release, TorchRL 0.13 also publishes Linux CUDA wheels. Install the CUDA wheel from the PyTorch wheel index that matches your PyTorch CUDA runtime, for example:

pip install "torchrl==0.13.0+cu128" --extra-index-url https://download.pytorch.org/whl/cu128

Replace cu128 with the CUDA build matching your PyTorch installation. The CUDA wheel is optional and only needed for CUDA prioritized replay-buffer kernels.

New public API exports

The following symbols were added to TorchRL package __init__.py exports between release/0.12.0 (2b2bac743) and v0.13.0 (30a143c21). Code links are permalinks to the v0.13.0 release commit export lines; docs links point to the stable reference pages. Duplicate root-level re-exports are collapsed so each symbol appears once, using the deepest available import path.

Environments, transforms, and MuJoCo

torchrl.envs.custom.mujoco

torchrl.envs.libs

torchrl.envs.transforms

Modules, models, and recurrent controls

torchrl.modules

torchrl.modules.models

torchrl.modules.tensordict_module

torchrl.modules.utils

torchrl.modules.utils.get_env_transforms_from_module (docs)

Objectives, value estimators, and utilities

torchrl.objectives

torchrl.objectives.multiagent

torchrl.objectives.value

torchrl.objectives.value.MultiAgentGAE (docs)

Data and replay buffers

torchrl.data.replay_buffers

Trainers and Hydra configs

torchrl.trainers

torchrl.trainers.EarlyStopping (docs)

torchrl.trainers.algorithms.configs

Deprecations and compatibility

Enacted v0.13 behavior changes for the deprecations and future-default warnings scheduled for this release.
Updated the release dependency constraints for TorchRL 0.13, including the TensorDict 0.13 pin/range.
Added optional Linux CUDA wheels for CUDA-based prioritized replay-buffer kernels; the standard PyPI wheel remains the default for CPU/no-prioritized-replay use cases.
Kept compatibility work focused on optional and older dependency stacks, including old Gym/Atari, Robohive, PettingZoo, vLLM, SGLang, and opt-deps workflows.

Bug fixes and quality

Fixed recurrent kernel issues, LSTM padding, recurrent policy auto-registration, value-estimator edge cases, async collector key preservation, and shared replay-buffer validation.
Improved collector weight synchronization idempotency and diagnostics.
Fixed tutorial/runtime issues including hidden multiprocessing imports, PPO tutorial doc mismatches, GRPO vLLM/SGLang runtime issues, and setup/test workflow failures.
Fixed release-CI issues surfaced during the 0.13 dry run: stable TensorDict source checks, TorchCodec CPU wheel selection, mixed-device check_env_specs validation, and GRU scan splitting under Dynamo.
Added local development setup improvements, a refreshed top-level README, broader CI/benchmark workflow fixes, and RNN reset rollout benchmarks across recurrent backends.

Upgrade note

Upgrade TorchRL and TensorDict together for this release series. Code that relied on APIs with v0.13-targeted FutureWarning or DeprecationWarning messages should be checked against the enacted 0.13 behavior before training jobs are resumed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchRL v0.13.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TorchRL v0.13.0

Merged PRs included in v0.13.0

Release, documentation, and CI

Recurrent RL, value estimation, and world models

Environments, transforms, and examples

Collectors, replay buffers, and performance

Objectives, trainers, and multi-agent learning

Compatibility, bug fixes, and cleanup

Highlights

Faster recurrent RL and clearer recurrent state handling

MuJoCo environments and macro-control policies

Multi-agent, imitation, and model-based objectives

Replay buffers, collectors, and action transforms

Broader environment and training coverage

Installation note: optional CUDA wheels

New public API exports

Environments, transforms, and MuJoCo

Modules, models, and recurrent controls

Objectives, value estimators, and utilities

Data and replay buffers

Trainers and Hydra configs

Deprecations and compatibility

Bug fixes and quality

Upgrade note

Uh oh!