Skip to content

TorchRL v0.13.1

Choose a tag to compare

@vmoens vmoens released this 08 Jun 12:05
· 0 commits to b74ec798a97dc98636297857d8a3e84a26b4c628 since this release

TorchRL v0.13.1 is a maintenance release for the 0.13 line. It carries post-0.13.0 RNN backend fixes and performance improvements, compile-friendliness fixes, SOTA example dependency refreshes, and documentation improvements.

Merged PR inventory

Recurrent modules and RNN backends

  • #3818 improves Triton RNN recurrent-matmul robustness with large-hidden tiling, 64-bit offsets, and faster autotune behavior.
  • #3752 adds recompute backward support and narrow RNN canonicalization to reduce learner memory pressure when multiple recurrent modules share a batch.

Compile and conversion stability

  • #3819 avoids a to_module FutureWarning graph break under torch.compile while preserving the previous state-preserving conversion behavior.

SOTA implementation dependency refreshes

  • #3708 updates the GRPO SOTA implementation to vllm 0.20.0.
  • #3601 updates the expert-iteration SOTA implementation to transformers 5.0.0rc3.

Documentation

  • #3821 fixes non-resolving API cross-references across docs and tutorials.
  • #3822 migrates the docs to pytorch_sphinx_theme2 and fixes tutorial Colab, Notebook, and GitHub links.
  • #3745 adds a memory-efficient RL training tutorial and cross-references for layout and recurrent-training guidance.

Newly exported public symbols

Utilities

  • torchrl.cuda_memory_profile (code, docs) — context manager/decorator for scoped CUDA memory profiling.
  • torchrl.cuda_memory_stats (code, docs) — helper for reading current and peak CUDA allocation/reservation statistics.
  • torchrl.reset_cuda_peak_stats (code, docs) — helper for resetting CUDA peak memory counters.

Modules

  • torchrl.modules.tensordict_module.canonicalize_rnn_subset (also re-exported as torchrl.modules.canonicalize_rnn_subset; specific export, package export, docs) — canonicalizes only the recurrent keys used by selected RNN modules.

Highlights

  • More robust Triton RNN recurrent matrix multiplication for large hidden sizes and backend autotuning.
  • Lower-memory recurrent learner updates through recompute backward and subset canonicalization.
  • Cleaner torch.compile behavior for state-preserving module conversion.
  • Updated docs theme and repaired generated API and tutorial links.
  • New memory-efficient RL training tutorial.
  • Refreshed dependency pins for GRPO and expert-iteration SOTA examples.

Installation

pip install torchrl==0.13.1

For CUDA wheel variants, follow the install index documented in the TorchRL README for the desired CUDA runtime.

Full changelog

v0.13.0...v0.13.1