TorchRL v0.13.1
TorchRL v0.13.1 is a maintenance release for the 0.13 line. It carries post-0.13.0 RNN backend fixes and performance improvements, compile-friendliness fixes, SOTA example dependency refreshes, and documentation improvements.
Merged PR inventory
Recurrent modules and RNN backends
- #3818 improves Triton RNN recurrent-matmul robustness with large-hidden tiling, 64-bit offsets, and faster autotune behavior.
- #3752 adds recompute backward support and narrow RNN canonicalization to reduce learner memory pressure when multiple recurrent modules share a batch.
Compile and conversion stability
- #3819 avoids a
to_moduleFutureWarning graph break undertorch.compilewhile preserving the previous state-preserving conversion behavior.
SOTA implementation dependency refreshes
- #3708 updates the GRPO SOTA implementation to
vllm0.20.0. - #3601 updates the expert-iteration SOTA implementation to
transformers5.0.0rc3.
Documentation
- #3821 fixes non-resolving API cross-references across docs and tutorials.
- #3822 migrates the docs to
pytorch_sphinx_theme2and fixes tutorial Colab, Notebook, and GitHub links. - #3745 adds a memory-efficient RL training tutorial and cross-references for layout and recurrent-training guidance.
Newly exported public symbols
Utilities
torchrl.cuda_memory_profile(code, docs) — context manager/decorator for scoped CUDA memory profiling.torchrl.cuda_memory_stats(code, docs) — helper for reading current and peak CUDA allocation/reservation statistics.torchrl.reset_cuda_peak_stats(code, docs) — helper for resetting CUDA peak memory counters.
Modules
torchrl.modules.tensordict_module.canonicalize_rnn_subset(also re-exported astorchrl.modules.canonicalize_rnn_subset; specific export, package export, docs) — canonicalizes only the recurrent keys used by selected RNN modules.
Highlights
- More robust Triton RNN recurrent matrix multiplication for large hidden sizes and backend autotuning.
- Lower-memory recurrent learner updates through recompute backward and subset canonicalization.
- Cleaner
torch.compilebehavior for state-preserving module conversion. - Updated docs theme and repaired generated API and tutorial links.
- New memory-efficient RL training tutorial.
- Refreshed dependency pins for GRPO and expert-iteration SOTA examples.
Installation
pip install torchrl==0.13.1For CUDA wheel variants, follow the install index documented in the TorchRL README for the desired CUDA runtime.