Release TorchRL v0.13.1 · pytorch/rl

TorchRL v0.13.1 is a maintenance release for the 0.13 line. It carries post-0.13.0 RNN backend fixes and performance improvements, compile-friendliness fixes, SOTA example dependency refreshes, and documentation improvements.

Merged PR inventory

Recurrent modules and RNN backends

#3818 improves Triton RNN recurrent-matmul robustness with large-hidden tiling, 64-bit offsets, and faster autotune behavior.
#3752 adds recompute backward support and narrow RNN canonicalization to reduce learner memory pressure when multiple recurrent modules share a batch.

Compile and conversion stability

#3819 avoids a to_module FutureWarning graph break under torch.compile while preserving the previous state-preserving conversion behavior.

SOTA implementation dependency refreshes

#3708 updates the GRPO SOTA implementation to vllm 0.20.0.
#3601 updates the expert-iteration SOTA implementation to transformers 5.0.0rc3.

Documentation

#3821 fixes non-resolving API cross-references across docs and tutorials.
#3822 migrates the docs to pytorch_sphinx_theme2 and fixes tutorial Colab, Notebook, and GitHub links.
#3745 adds a memory-efficient RL training tutorial and cross-references for layout and recurrent-training guidance.

Newly exported public symbols

Utilities

torchrl.cuda_memory_profile (code, docs) — context manager/decorator for scoped CUDA memory profiling.
torchrl.cuda_memory_stats (code, docs) — helper for reading current and peak CUDA allocation/reservation statistics.
torchrl.reset_cuda_peak_stats (code, docs) — helper for resetting CUDA peak memory counters.

Modules

torchrl.modules.tensordict_module.canonicalize_rnn_subset (also re-exported as torchrl.modules.canonicalize_rnn_subset; specific export, package export, docs) — canonicalizes only the recurrent keys used by selected RNN modules.

Highlights

More robust Triton RNN recurrent matrix multiplication for large hidden sizes and backend autotuning.
Lower-memory recurrent learner updates through recompute backward and subset canonicalization.
Cleaner torch.compile behavior for state-preserving module conversion.
Updated docs theme and repaired generated API and tutorial links.
New memory-efficient RL training tutorial.
Refreshed dependency pins for GRPO and expert-iteration SOTA examples.

Installation

pip install torchrl==0.13.1

For CUDA wheel variants, follow the install index documented in the TorchRL README for the desired CUDA runtime.

Full changelog

v0.13.0...v0.13.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchRL v0.13.1

Choose a tag to compare

Sorry, something went wrong.