TorchRL v0.13.0
TorchRL v0.13.0
TorchRL 0.13.0 is a broad release focused on recurrent RL throughput, MuJoCo-native environments, macro-control workflows, multi-agent training utilities, and release-aligned cleanup of previously warned deprecations. It also introduces optional Linux CUDA wheels for users who want CUDA-based prioritized replay-buffer kernels, while keeping the standard PyPI wheel as the default for CPU prioritized replay buffers and users who do not need prioritized replay. The release refreshes compatibility with optional and older dependency stacks used by TorchRL's wider environment coverage.
Merged PRs included in v0.13.0
This release includes the following merged PRs in the v0.12.0..v0.13.0 first-parent range:
Release, documentation, and CI
- #3817 — Documented optional CUDA TorchRL wheels for CUDA-based prioritized replay-buffer kernels while keeping the standard PyPI wheel as the default install.
- #3814 — Fixed release-CI dependency checks, including stable TensorDict selection, TorchCodec CPU wheels, and mixed-device spec validation.
- #3813 — Refreshed the README for the 0.13 release.
- #3805 — Enacted the v0.13 deprecations and behavior changes.
- #3796 — Added collector internals documentation.
- #3803 — Bumped TorchRL version metadata to 0.13.0.
- #3787 — Added uv local development setup.
- #3747 — Split large test files into per-concept files.
- #3741 — Added automatic insertion of new release versions into gh-pages versions.html.
- #3739 — Fixed olddeps / opt-deps / gym smoke tests broken by #3738 and #3704.
- #3710 — Bumped the olddeps TensorDict default to 0.12.
- #3705 — Fixed benchmark workflows.
- #3706 — Moved stdlib and TorchRL local imports in tests to module top.
- #3696 — Added NestedKey and Literal contribution guidance.
- #3686 — Added repository contribution rules for AI agents.
Recurrent RL, value estimation, and world models
- #3816 — Added RNN reset rollout benchmarks for TorchRL
LSTMModule/GRUModule, covering intermediate resets across cuDNN, scan, and Triton backends with eager and compiled runs. - #3815 — Kept GRU scan split sizes concrete for compiled recurrent rollouts while preserving old/optional dependency compatibility.
- #3780 — Added a dynamic value-estimator registry across loss modules.
- #3807 — Gated scan RNN backward support for compatible environments.
- #3792 — Added the recurrent state lifecycle guide.
- #3793 — Added recurrent integration coverage.
- #3797 — Added chunked TensorDict support for value estimators.
- #3785 — Fixed the Triton RNN kernel.
- #3784 — Simplified the shifted value-estimator budget.
- #3782 — Added the compact-drop shifted value backend.
- #3744 — Sanitized NaN next observations in value-estimator forwards.
- #3738 — Added the Triton backend for GRU/LSTM with intermediate resets.
- #3712 — Fixed LSTMModule padding.
- #3707 — Stabilized the RSSMPosteriorV3 gradient test.
- #3695 — Improved sequence-RL composability.
- #3621 — Implemented DreamerV3 world-model objectives and components.
Environments, transforms, and examples
- #3781 — Surfaced Isaac Lab per-index reset and reset_to via IsaacLabWrapper.
- #3811 — Fixed MuJoCo macro shapes and Gym Atari setup.
- #3779 — Fixed a hidden multiprocessing import in the coding PPO tutorial.
- #3806 — Added the MuJoCo macro tutorial environment.
- #3802 — Added satellite MuJoCo SAC examples.
- #3800 — Added IsaacLab headless rendered eval.
- #3700 — Added MuJoCo custom environments with selectable physics backends.
- #3801 — Fixed unused frame_skip in PPO tutorial.
- #3777 — Added the NextObservationDelta environment transform.
- #3766 — Added the FlattenAction transform.
- #3765 — Added the ActionScaling transform.
- #3727 — Fixed KeyError in PettingZoo action mask with ParallelEnv and done_on_any=False.
- #3743 — Added the NextStateReconstructor replay-buffer transform.
- #3742 — Added compact_obs support to DataCollector.
- #3698 — Fixed GymEnv reward and action shapes across num-env configurations.
- #3689 — Added Safety-Gymnasium environment wrappers.
- #3682 — Fixed PettingZoo state handling and added an encoding regression test.
- #3676 — Added the ExpandAs transform.
Collectors, replay buffers, and performance
- #3810 — Made collector weight synchronization idempotent.
- #3734 — Added HERReplayBuffer and HindsightStrategy to torchrl.data.
- #3749 — Dispatched RandomSampler and SliceSampler to without-replacement variants via replacement=False.
- #3729 — Auto-created inner SharedMem schemes for Ray/RPC and policy_factory.
- #3728 — Made per-worker SharedMem schemes opt-in for policy_factory tests.
- #3714 — Added async prioritized replay-buffer writes.
- #3680 — Gated the profiling decorator on TORCHRL_PROFILING.
- #3685 — Improved CUDA prioritized replay-buffer ergonomics.
- #3672 — Added data-collector hooks.
- #3677 — Added CUDA support for prioritized replay sampling, available through the optional CUDA wheel builds.
Objectives, trainers, and multi-agent learning
- #3773 — Added CrossGroupCritic.
- #3748 — Added MAPPOLoss, IPPOLoss, MultiAgentGAE, and ValueNorm.
- #3750 — Fixed the cross-entropy reduction parameter in discrete objectives.
- #3694 — Added QMix, VDN, and IQL support to the DQN trainer.
- #3699 — Improved Hydra config parity for environments, losses, and loggers.
- #3692 — Added an early-stopping trainer hook.
- #3693 — Audited TrainerConfig/Trainer parity and added auto_log_optim_steps plumbing.
- #3683 — Made DDPG, PPO, and SAC trainers multi-agent-friendly.
- #3691 — Added trainer hook configs.
- #3639 — Added ACTModel and ACTLoss for robot learning.
- #3667 — Added behavior-cloning loss.
- #3679 — Fixed PPOTrainer gamma/lambda defaults, removed dead code, and removed a wildcard import.
Compatibility, bug fixes, and cleanup
- #3809 — Fixed GRPO runtime issues for vLLM and SGLang backends.
- #3812 — Fixed Robohive, Gym, PettingZoo, and setup-test CI failures.
- #3808 — Fixed LLM assistant action masks.
- #3799 — Fixed old- and optional-dependency workflows.
- #3709 — Updated the REINFORCE value-net test for new torch autograd error wording.
- #3704 — Forwarded generator kwargs through ProbabilisticActor.
- #3688 — Added setup and shutdown hook points.
- #3684 — Split environment transforms into per-category modules.
Highlights
Faster recurrent RL and clearer recurrent state handling
- Added Triton and scan recurrent backends for GRU/LSTM reset handling, including fixes for recurrent matmul kernels and backward support.
- Expanded recurrent integration coverage and documentation for hidden-state lifecycle management.
- Added RNN reset rollout benchmarks for
LSTMModuleandGRUModuleacross cuDNN, scan, and Triton backends, including compiled TensorDict-input runs. - Improved compact and shifted value-estimator flows, including chunked forwards, cleaner sequence layouts, and a dynamic value-estimator registry across loss modules.
MuJoCo environments and macro-control policies
- Added native custom MuJoCo environments with selectable physics backends, including
MujocoEnv, locomotion tasks,SatelliteEnv, andCubeBowlEnv. - Added satellite MuJoCo SAC examples and a macro-control tutorial environment for cube-to-bowl manipulation.
- Introduced macro-action helpers and transforms for low-frequency semantic control over multi-step low-level action sequences.
Multi-agent, imitation, and model-based objectives
- Added MAPPO/IPPO losses,
MultiAgentGAE, and value-normalization utilities for multi-agent training. - Added behavior-cloning and action-chunking losses/modules through
BCLoss,ACTLoss, andACTModel. - Added DreamerV3 losses, RSSM V3 components, and world-model utility functions.
Replay buffers, collectors, and action transforms
- Added HER replay-buffer support via
HERReplayBufferandHindsightStrategy. - Added action-space transforms such as
FlattenAction,ActionScaling,ExpandAs, andNextObservationDelta. - Improved collector and replay-buffer performance with async prioritized writes, ordered read/write APIs, optional trajectory IDs, compact observations, safer policy-version handling, CUDA prioritized replay sampling, and optional CUDA wheels for those CUDA kernels.
CUDA prioritized replay-buffer sampling. On the TorchRL CI benchmark below (batch size 65,536, sampling from 1M and 10M priorities), the CUDA implementation reduces mean sampling latency from 9.99 ms to 0.45 ms at 1M priorities and from 18.68 ms to 0.93 ms at 10M priorities — approximately 22.3× and 20.2× faster than the CPU path. The CUDA wheel is therefore most useful for GPU-heavy prioritized replay workloads; users who do not use prioritized replay, or who use CPU prioritized replay, can keep the standard install.
Broader environment and training coverage
- Added Safety-Gymnasium wrappers and additional Isaac Lab reset/evaluation support.
- Improved trainer hooks, Hydra config parity, trainer hook configs, and multi-agent trainer ergonomics.
- Strengthened old Gym, Atari, PettingZoo, Robohive, optional-dependency, vLLM, SGLang, and setup CI coverage.
Installation note: optional CUDA wheels
The default install remains:
pip install torchrlUse this standard PyPI wheel if you do not use prioritized replay buffers, or if your prioritized replay buffers run on CPU. For users who want the CUDA-based prioritized replay-buffer implementations introduced in this release, TorchRL 0.13 also publishes Linux CUDA wheels. Install the CUDA wheel from the PyTorch wheel index that matches your PyTorch CUDA runtime, for example:
pip install "torchrl==0.13.0+cu128" --extra-index-url https://download.pytorch.org/whl/cu128Replace cu128 with the CUDA build matching your PyTorch installation. The CUDA wheel is optional and only needed for CUDA prioritized replay-buffer kernels.
New public API exports
The following symbols were added to TorchRL package __init__.py exports between release/0.12.0 (2b2bac743) and v0.13.0 (30a143c21). Code links are permalinks to the v0.13.0 release commit export lines; docs links point to the stable reference pages. Duplicate root-level re-exports are collapsed so each symbol appears once, using the deepest available import path.
Environments, transforms, and MuJoCo
torchrl.envs.custom.mujoco
torchrl.envs.custom.mujoco.AntEnv(docs)torchrl.envs.custom.mujoco.CubeBowlEnv(docs)torchrl.envs.custom.mujoco.HopperEnv(docs)torchrl.envs.custom.mujoco.HumanoidEnv(docs)torchrl.envs.custom.mujoco.HumanoidMacroAction(docs)torchrl.envs.custom.mujoco.MujocoEnv(docs)torchrl.envs.custom.mujoco.SatelliteAttitudeTransform(docs)torchrl.envs.custom.mujoco.SatelliteEnv(docs)torchrl.envs.custom.mujoco.SatelliteMacroAction(docs)torchrl.envs.custom.mujoco.Walker2dEnv(docs)
torchrl.envs.libs
torchrl.envs.transforms
torchrl.envs.transforms.ActionScaling(docs)torchrl.envs.transforms.ExpandAs(docs)torchrl.envs.transforms.FlattenAction(docs)torchrl.envs.transforms.MacroAction(docs)torchrl.envs.transforms.MacroPrimitive(docs)torchrl.envs.transforms.MacroPrimitiveTransform(docs)torchrl.envs.transforms.NextObservationDelta(docs)torchrl.envs.transforms.NextStateReconstructor(docs)torchrl.envs.transforms.RobotMacroAction(docs)torchrl.envs.transforms.RobotMacroActionMode(docs)torchrl.envs.transforms.TargetMacroAction(docs)torchrl.envs.transforms.TerminateTransform(docs)torchrl.envs.transforms.URScriptPrimitive(docs)torchrl.envs.transforms.URScriptPrimitiveTransform(docs)
Modules, models, and recurrent controls
torchrl.modules
torchrl.modules.PopArtValueNorm(docs)torchrl.modules.RunningValueNorm(docs)torchrl.modules.ValueNorm(docs)
torchrl.modules.models
torchrl.modules.models.ACTModel(docs)torchrl.modules.models.CrossCriticGroupSpec(docs)torchrl.modules.models.CrossGroupCritic(docs)torchrl.modules.models.RSSMPosteriorV3(docs)torchrl.modules.models.RSSMPriorV3(docs)torchrl.modules.models.RSSMRolloutV3(docs)
torchrl.modules.tensordict_module
torchrl.modules.tensordict_module.get_recurrent_matmul_precision(docs)torchrl.modules.tensordict_module.RecurrentMatmulPrecision(docs)torchrl.modules.tensordict_module.RecurrentMatmulPrecisionUserMode(docs)torchrl.modules.tensordict_module.set_recurrent_matmul_precision(docs)
torchrl.modules.utils
Objectives, value estimators, and utilities
torchrl.objectives
torchrl.objectives.ACTLoss(docs)torchrl.objectives.BCLoss(docs)torchrl.objectives.categorical_kl_balanced(docs)torchrl.objectives.DreamerV3ActorLoss(docs)torchrl.objectives.DreamerV3ModelLoss(docs)torchrl.objectives.DreamerV3ValueLoss(docs)torchrl.objectives.symexp(docs)torchrl.objectives.symlog(docs)torchrl.objectives.two_hot_decode(docs)torchrl.objectives.two_hot_encode(docs)
torchrl.objectives.multiagent
torchrl.objectives.value
Data and replay buffers
torchrl.data.replay_buffers
torchrl.data.replay_buffers.HERReplayBuffer(docs)torchrl.data.replay_buffers.HindsightStrategy(docs)
Trainers and Hydra configs
torchrl.trainers
torchrl.trainers.algorithms.configs
torchrl.trainers.algorithms.configs.BatchSubSamplerConfig(docs)torchrl.trainers.algorithms.configs.ClearCudaCacheConfig(docs)torchrl.trainers.algorithms.configs.CountFramesLogConfig(docs)torchrl.trainers.algorithms.configs.EarlyStoppingConfig(docs)torchrl.trainers.algorithms.configs.HookConfig(docs)torchrl.trainers.algorithms.configs.LogScalarConfig(docs)torchrl.trainers.algorithms.configs.LogTimingConfig(docs)torchrl.trainers.algorithms.configs.QMixerLossConfig(docs)torchrl.trainers.algorithms.configs.QMixerNetworkConfig(docs)torchrl.trainers.algorithms.configs.RewardNormalizerConfig(docs)torchrl.trainers.algorithms.configs.SelectKeysConfig(docs)torchrl.trainers.algorithms.configs.TrackioLoggerConfig(docs)torchrl.trainers.algorithms.configs.VDNMixerNetworkConfig(docs)
Deprecations and compatibility
- Enacted v0.13 behavior changes for the deprecations and future-default warnings scheduled for this release.
- Updated the release dependency constraints for TorchRL 0.13, including the TensorDict 0.13 pin/range.
- Added optional Linux CUDA wheels for CUDA-based prioritized replay-buffer kernels; the standard PyPI wheel remains the default for CPU/no-prioritized-replay use cases.
- Kept compatibility work focused on optional and older dependency stacks, including old Gym/Atari, Robohive, PettingZoo, vLLM, SGLang, and opt-deps workflows.
Bug fixes and quality
- Fixed recurrent kernel issues, LSTM padding, recurrent policy auto-registration, value-estimator edge cases, async collector key preservation, and shared replay-buffer validation.
- Improved collector weight synchronization idempotency and diagnostics.
- Fixed tutorial/runtime issues including hidden multiprocessing imports, PPO tutorial doc mismatches, GRPO vLLM/SGLang runtime issues, and setup/test workflow failures.
- Fixed release-CI issues surfaced during the 0.13 dry run: stable TensorDict source checks, TorchCodec CPU wheel selection, mixed-device
check_env_specsvalidation, and GRU scan splitting under Dynamo. - Added local development setup improvements, a refreshed top-level README, broader CI/benchmark workflow fixes, and RNN reset rollout benchmarks across recurrent backends.
Upgrade note
Upgrade TorchRL and TensorDict together for this release series. Code that relied on APIs with v0.13-targeted FutureWarning or DeprecationWarning messages should be checked against the enacted 0.13 behavior before training jobs are resumed.