Skip to content

Releases: pytorch/rl

TorchRL v0.13.2

17 Jun 06:58

Choose a tag to compare

TorchRL 0.13.2 is a patch release focused on regression fixes, release reliability, and CI stability. It intentionally avoids new feature backports.

Highlights

  • Fixed Isaac Lab reset regressions, including Direct-env autoreset opt-in behavior, observation-key normalization, and reset_to edge cases.
  • Fixed SliceSampler and EnvBase.step regressions affecting compile compatibility and batch-locked environments.
  • Fixed MultiSyncCollector worker-output gathering and improved preemption throughput without busy-waiting.
  • Made the vLLM FP32 override opt-in so importing TorchRL no longer changes host vLLM behavior.
  • Restored and hardened nightly/release publishing checks, including PyPI wheel filtering and release-build symbol stripping.

Backported fixes and maintenance

  • #3869 Fix IsaacLab reset regressions.
  • #3868 Make the vLLM FP32 plugin opt-in.
  • #3861 Fix FP32 override registration in the vLLM plugin.
  • #3858 Fix _skip_tensordict handling in EnvBase.step.
  • 4679d0a Fix SliceSampler compile compatibility.
  • #3856 Fix flaky Pendulum spec and collector preemption-ordering tests.
  • #3843, #3841 Fix and optimize MultiSyncCollector preemption.
  • #3846, #3845, #3844 Harden nightly version checks and uploads.
  • #3859, #3855, #3842, #3825, 12093ae Improve release, benchmark, auto-tag, and docs CI reliability.
  • #3852 Fix tutorial code links.
  • #3849, #3848, #3847 Refresh SOTA dependency pins.

No new public API exports are intended in this patch release.

TorchRL v0.13.1

08 Jun 12:05

Choose a tag to compare

TorchRL v0.13.1 is a maintenance release for the 0.13 line. It carries post-0.13.0 RNN backend fixes and performance improvements, compile-friendliness fixes, SOTA example dependency refreshes, and documentation improvements.

Merged PR inventory

Recurrent modules and RNN backends

  • #3818 improves Triton RNN recurrent-matmul robustness with large-hidden tiling, 64-bit offsets, and faster autotune behavior.
  • #3752 adds recompute backward support and narrow RNN canonicalization to reduce learner memory pressure when multiple recurrent modules share a batch.

Compile and conversion stability

  • #3819 avoids a to_module FutureWarning graph break under torch.compile while preserving the previous state-preserving conversion behavior.

SOTA implementation dependency refreshes

  • #3708 updates the GRPO SOTA implementation to vllm 0.20.0.
  • #3601 updates the expert-iteration SOTA implementation to transformers 5.0.0rc3.

Documentation

  • #3821 fixes non-resolving API cross-references across docs and tutorials.
  • #3822 migrates the docs to pytorch_sphinx_theme2 and fixes tutorial Colab, Notebook, and GitHub links.
  • #3745 adds a memory-efficient RL training tutorial and cross-references for layout and recurrent-training guidance.

Newly exported public symbols

Utilities

  • torchrl.cuda_memory_profile (code, docs) — context manager/decorator for scoped CUDA memory profiling.
  • torchrl.cuda_memory_stats (code, docs) — helper for reading current and peak CUDA allocation/reservation statistics.
  • torchrl.reset_cuda_peak_stats (code, docs) — helper for resetting CUDA peak memory counters.

Modules

  • torchrl.modules.tensordict_module.canonicalize_rnn_subset (also re-exported as torchrl.modules.canonicalize_rnn_subset; specific export, package export, docs) — canonicalizes only the recurrent keys used by selected RNN modules.

Highlights

  • More robust Triton RNN recurrent matrix multiplication for large hidden sizes and backend autotuning.
  • Lower-memory recurrent learner updates through recompute backward and subset canonicalization.
  • Cleaner torch.compile behavior for state-preserving module conversion.
  • Updated docs theme and repaired generated API and tutorial links.
  • New memory-efficient RL training tutorial.
  • Refreshed dependency pins for GRPO and expert-iteration SOTA examples.

Installation

pip install torchrl==0.13.1

For CUDA wheel variants, follow the install index documented in the TorchRL README for the desired CUDA runtime.

Full changelog

v0.13.0...v0.13.1

TorchRL v0.13.0

05 Jun 13:53
30a143c

Choose a tag to compare

TorchRL v0.13.0

TorchRL 0.13.0 is a broad release focused on recurrent RL throughput, MuJoCo-native environments, macro-control workflows, multi-agent training utilities, and release-aligned cleanup of previously warned deprecations. It also introduces optional Linux CUDA wheels for users who want CUDA-based prioritized replay-buffer kernels, while keeping the standard PyPI wheel as the default for CPU prioritized replay buffers and users who do not need prioritized replay. The release refreshes compatibility with optional and older dependency stacks used by TorchRL's wider environment coverage.

image

Merged PRs included in v0.13.0

This release includes the following merged PRs in the v0.12.0..v0.13.0 first-parent range:

Release, documentation, and CI

  • #3817 — Documented optional CUDA TorchRL wheels for CUDA-based prioritized replay-buffer kernels while keeping the standard PyPI wheel as the default install.
  • #3814 — Fixed release-CI dependency checks, including stable TensorDict selection, TorchCodec CPU wheels, and mixed-device spec validation.
  • #3813 — Refreshed the README for the 0.13 release.
  • #3805 — Enacted the v0.13 deprecations and behavior changes.
  • #3796 — Added collector internals documentation.
  • #3803 — Bumped TorchRL version metadata to 0.13.0.
  • #3787 — Added uv local development setup.
  • #3747 — Split large test files into per-concept files.
  • #3741 — Added automatic insertion of new release versions into gh-pages versions.html.
  • #3739 — Fixed olddeps / opt-deps / gym smoke tests broken by #3738 and #3704.
  • #3710 — Bumped the olddeps TensorDict default to 0.12.
  • #3705 — Fixed benchmark workflows.
  • #3706 — Moved stdlib and TorchRL local imports in tests to module top.
  • #3696 — Added NestedKey and Literal contribution guidance.
  • #3686 — Added repository contribution rules for AI agents.

Recurrent RL, value estimation, and world models

  • #3816 — Added RNN reset rollout benchmarks for TorchRL LSTMModule/GRUModule, covering intermediate resets across cuDNN, scan, and Triton backends with eager and compiled runs.
  • #3815 — Kept GRU scan split sizes concrete for compiled recurrent rollouts while preserving old/optional dependency compatibility.
  • #3780 — Added a dynamic value-estimator registry across loss modules.
  • #3807 — Gated scan RNN backward support for compatible environments.
  • #3792 — Added the recurrent state lifecycle guide.
  • #3793 — Added recurrent integration coverage.
  • #3797 — Added chunked TensorDict support for value estimators.
  • #3785 — Fixed the Triton RNN kernel.
  • #3784 — Simplified the shifted value-estimator budget.
  • #3782 — Added the compact-drop shifted value backend.
  • #3744 — Sanitized NaN next observations in value-estimator forwards.
  • #3738 — Added the Triton backend for GRU/LSTM with intermediate resets.
  • #3712 — Fixed LSTMModule padding.
  • #3707 — Stabilized the RSSMPosteriorV3 gradient test.
  • #3695 — Improved sequence-RL composability.
  • #3621 — Implemented DreamerV3 world-model objectives and components.

Environments, transforms, and examples

  • #3781 — Surfaced Isaac Lab per-index reset and reset_to via IsaacLabWrapper.
  • #3811 — Fixed MuJoCo macro shapes and Gym Atari setup.
  • #3779 — Fixed a hidden multiprocessing import in the coding PPO tutorial.
  • #3806 — Added the MuJoCo macro tutorial environment.
  • #3802 — Added satellite MuJoCo SAC examples.
  • #3800 — Added IsaacLab headless rendered eval.
  • #3700 — Added MuJoCo custom environments with selectable physics backends.
  • #3801 — Fixed unused frame_skip in PPO tutorial.
  • #3777 — Added the NextObservationDelta environment transform.
  • #3766 — Added the FlattenAction transform.
  • #3765 — Added the ActionScaling transform.
  • #3727 — Fixed KeyError in PettingZoo action mask with ParallelEnv and done_on_any=False.
  • #3743 — Added the NextStateReconstructor replay-buffer transform.
  • #3742 — Added compact_obs support to DataCollector.
  • #3698 — Fixed GymEnv reward and action shapes across num-env configurations.
  • #3689 — Added Safety-Gymnasium environment wrappers.
  • #3682 — Fixed PettingZoo state handling and added an encoding regression test.
  • #3676 — Added the ExpandAs transform.

Collectors, replay buffers, and performance

  • #3810 — Made collector weight synchronization idempotent.
  • #3734 — Added HERReplayBuffer and HindsightStrategy to torchrl.data.
  • #3749 — Dispatched RandomSampler and SliceSampler to without-replacement variants via replacement=False.
  • #3729 — Auto-created inner SharedMem schemes for Ray/RPC and policy_factory.
  • #3728 — Made per-worker SharedMem schemes opt-in for policy_factory tests.
  • #3714 — Added async prioritized replay-buffer writes.
  • #3680 — Gated the profiling decorator on TORCHRL_PROFILING.
  • #3685 — Improved CUDA prioritized replay-buffer ergonomics.
  • #3672 — Added data-collector hooks.
  • #3677 — Added CUDA support for prioritized replay sampling, available through the optional CUDA wheel builds.

Objectives, trainers, and multi-agent learning

  • #3773 — Added CrossGroupCritic.
  • #3748 — Added MAPPOLoss, IPPOLoss, MultiAgentGAE, and ValueNorm.
  • #3750 — Fixed the cross-entropy reduction parameter in discrete objectives.
  • #3694 — Added QMix, VDN, and IQL support to the DQN trainer.
  • #3699 — Improved Hydra config parity for environments, losses, and loggers.
  • #3692 — Added an early-stopping trainer hook.
  • #3693 — Audited TrainerConfig/Trainer parity and added auto_log_optim_steps plumbing.
  • #3683 — Made DDPG, PPO, and SAC trainers multi-agent-friendly.
  • #3691 — Added trainer hook configs.
  • #3639 — Added ACTModel and ACTLoss for robot learning.
  • #3667 — Added behavior-cloning loss.
  • #3679 — Fixed PPOTrainer gamma/lambda defaults, removed dead code, and removed a wildcard import.

Compatibility, bug fixes, and cleanup

  • #3809 — Fixed GRPO runtime issues for vLLM and SGLang backends.
  • #3812 — Fixed Robohive, Gym, PettingZoo, and setup-test CI failures.
  • #3808 — Fixed LLM assistant action masks.
  • #3799 — Fixed old- and optional-dependency workflows.
  • #3709 — Updated the REINFORCE value-net test for new torch autograd error wording.
  • #3704 — Forwarded generator kwargs through ProbabilisticActor.
  • #3688 — Added setup and shutdown hook points.
  • #3684 — Split environment transforms into per-category modules.

Highlights

Faster recurre...

Read more

TorchRL v0.12.0

27 Apr 17:23
2b2bac7

Choose a tag to compare

TorchRL v0.12.0 Release Notes

Highlights

  • New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new DDPMModule diffusion actor and DiffusionBCLoss are included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens).

  • Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The new TrajectoryBatcher and AsyncTrajectoryBatcher assemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replace mp.Event for lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead in step_and_maybe_reset (@vmoens).

  • Inference backends. This release adds full SGLang integration alongside vLLM, with an SGLangWrapper policy module, an AsyncSGLang server-based inference path, NCCL weight synchronization, and GRPO support (@vmoens).

  • Replay buffer. StoreStorage is a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens).

  • Evaluation. A new Evaluator class provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync via WeightSyncScheme, multi-model support, and a RayEvalWorker for distributed evaluation (@vmoens).

  • Environments and platform support. A new GenesisEnv wrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting in ParallelEnv, SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).

Installation

pip install torchrl==0.12.0

Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.


Breaking Changes

  • Remove v0.12 deprecated APIs (#3670) @vmoens
    • The local_init_rb parameter has been removed from Collector and MultiCollector. Storage-level initialization is now the only behavior.
    • TransformedEnv(env=...) now raises TypeError. Use TransformedEnv(base_env=...) instead.

New Features

Auto-batching Inference Server

A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.

  • Core server and transport protocol (#3492)
  • Threading transport (#3493)
  • Multiprocessing transport (#3494)
  • Ray transport (#3495)
  • Monarch transport (#3496)
  • Weight sync integration (#3497)

AsyncBatchedCollector

A new collector that combines async environments with the auto-batching inference server for maximum throughput.

  • Async envs + auto-batching inference (#3498)
  • Coordinator loop and direct submission mode (#3499)
  • Backend params and performance optimizations (#3511)

Trajectory Batcher

  • TrajectoryBatcher for assembling trajectories from streaming transitions (#3584) @theap06
  • AsyncTrajectoryBatcher for asynchronous trajectory assembly (#3592) @theap06

SGLang Backend

Full SGLang support for LLM inference, mirroring the existing vLLM integration:

  • Base infrastructure (#3428)
  • AsyncSGLang server-based inference service (#3429)
  • SGLangWrapper policy module (#3430)
  • NCCL weight synchronization (#3431)
  • Module structure integration (#3432)
  • SGLang backend support in GRPO

Diffusion Policies

  • DDPMModule diffusion actor for denoising diffusion probabilistic models (#3596) @theap06
  • DiffusionBCLoss for diffusion-based behavioral cloning (#3604) @theap06

Evaluator

  • Evaluator class for sync/async evaluation (#3594)
  • Process backend, lazy init, and pending property (#3611)
  • Collector-based stepping backend (#3624)
  • Enable loggers to run as Ray actors (#3623)
  • Weight sync via WeightSyncScheme + multi-model support (#3627)
  • Isaac Lab Evaluator tests + init_fn plumbing for process backend (#3663)
  • RayEvalWorker for distributed async evaluation (#3474)
  • Named actors and from_name for RayEvalWorker (#3488)

Async PPO

  • Async PPO infrastructure for overlapping collection and optimization (#3661)

Config-based Trainers

New trainers with integrated configuration system:

Replay Buffer

  • StoreStorage for Redis/Dragonfly-backed replay buffers (#3516)
  • set_at_, set_, update_ methods on ReplayBuffer (#3590) @jashshah999
  • Support trajs_per_batch with replay_buffer on multi-process and distributed collectors (#3618)

LLM / GRPO

  • Token-in, token-out LLM wrapper mode (#3407)
  • GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
  • Namespace GRPO wandb metrics for auto-grouping (#3585)
  • Remove placement-group xfails and fix vLLM tokenizer compat (#3586)

Environments

Transforms

Logging

  • log_metrics method for efficient batch logging (#3452)
  • TensorDict support in log_metrics (#3455)

Specs

Algorithms

Collectors

  • Lazy-init RandomPolicy action_spec from env in collectors (#3664)

Other

  • __getattr__ in _dispatch_caller_parallel for transparent attribute access (#3389) @ParamThakkar123
  • scalar_output_mode for loss modules with reduction='none' (#3426)
  • ObsDecoder: out_channels parameter for grayscale decoding (#3472)
  • Ergonomic scalar assignment for loss buffers (#3612)
  • New memmap value for the CKPT_BACKEND environment variable ([#3619](https:...
Read more

TorchRL v0.11.1

05 Feb 08:47

Choose a tag to compare

Highlights

This patch release includes several important bug fixes and performance improvements:

  • Fixed Composite.encode() to correctly set the batch size of the output TensorDict
  • Fixed StepCounter to properly track nested truncated and done states in multi-agent environments
  • Fixed shared memory weight updater to work correctly with collectors using multiple policies
  • Fixed _repr_html_ dispatch in parallel environments that was causing doc CI failures
  • Added scalar_output_mode to loss modules for proper handling of reduction='none'
  • Fixed torch.compile configuration for Dreamer
  • Performance: GPU Image Transforms for Dreamer (~5.5x faster sampling)
  • Performance: SliceSampler GPU acceleration for faster trajectory computation
  • Performance: Always enable prefetch for replay buffer

Breaking Changes

No breaking changes in this release.

Bug Fixes

  • Fixed batch size in Composite.encode: The Composite.encode() method now correctly sets the batch_size of the output TensorDict to match the shape of the tensor spec, rather than returning an empty batch size. (#3411) - @tobiabir

    Previously, calling Composite.encode(raw_vals) would return a TensorDict with batch_size=torch.Size([]) regardless of the spec's shape. This is now fixed to return the correct batch size matching the spec shape.

  • Fixed StepCounter nested done/truncated tracking in multi-agent environments: StepCounter now properly updates nested truncated and done keys for multi-agent environments. (#3405) - @vmoens

    When using StepCounter with multi-agent environments (e.g., PettingZoo), the transform now correctly propagates truncated/done signals to agent-specific keys (e.g., ("agent", "truncated")) in addition to the root-level keys.

  • Fixed shared memory weight updater with multiple policies: The shared memory weight updater now correctly handles collectors that use multiple policies. (#3442) - @vmoens

  • Fixed _repr_html_ dispatch in parallel environments: Parallel environments no longer incorrectly dispatch private/special attribute access (like _repr_html_) to worker processes. (#3441) - @vmoens

  • Added scalar_output_mode to loss modules: Loss modules (SAC, IQL, CQL, CrossQ, REDQ, DecisionTransformer) now support scalar_output_mode parameter for proper handling of reduction='none'. (#3426) - @vmoens

  • Fixed torch.compile configuration for Dreamer: Fixed compilation settings for Dreamer world model training. - @vmoens

Performance Improvements

  • GPU Image Transforms for Dreamer: ~5.5x faster sampling with GPU-accelerated image transforms. - @vmoens
  • SliceSampler GPU acceleration: Faster trajectory computation using GPU. - @vmoens
  • Always enable prefetch for replay buffer: Improved data loading performance. - @vmoens

Cleanup

  • Removed pin_memory from replay buffer: Simplified replay buffer configuration. - @vmoens

Internal / CI Improvements

  • Added PyTorch version check instructions to release prompt (#3443) - @vmoens
  • Added tutorials CI workflow for testing sphinx tutorials (#3441) - @vmoens
  • Upgraded meshgrid usage to address PyTorch deprecation warning (#3412) - @vmoens
  • Added flaky test tracking system for improved CI reliability (#3408) - @vmoens
  • Added file-based auto-labeling for PR components (#3402) - @vmoens
  • Improved LLM prompt for release workflow (#3399) - @vmoens

Contributors

Thanks to all contributors to this release:

Installation

```bash
pip install torchrl==0.11.1
```

Or with conda:

```bash
conda install -c pytorch torchrl=0.11.1
```

TorchRL v0.11.0

28 Jan 08:30
f9ca748

Choose a tag to compare

TorchRL v0.11.0 Release Notes

Highlights

  • Dreamer overhaul - Comprehensive improvements to Dreamer world model training: async collectors with profiling, RSSM fixes (scan mode, noise injection, explicit dimensions), torch.compile compatibility for value functions and TDLambda estimator, optimized DreamerEnv to avoid CUDA syncs, and updated sota-implementation with better configs. @vmoens
  • Weight synchronization schemes - New modular weight sync infrastructure (torchrl.weight_update) with SharedMem, MultiProcess, and vLLM-specific (NCCL, double-buffer) schemes. Collectors now integrate seamlessly with weight sync schemes for distributed training. @vmoens
  • Major collector refactor - The collector codebase has been completely restructured. The monolithic collectors.py is now split into focused modules (_single.py, _multi_base.py, _multi_sync.py, _multi_async.py, _runner.py, base.py), with cleaner separation of concerns. (#3233) @vmoens
  • LLM objectives: DAPO & CISPO - New DAPO (Direct Advantage Policy Optimization) and CISPO (Clipped Importance Sampling Policy Optimization) algorithms for LLM training. @vmoens
  • Trainer infrastructure - New SAC Trainer, configuration system for algorithms, timing utilities, and async collection support within trainers. @vmoens
  • Tool services - New tool service infrastructure for LLM agents with Python executor, MCP tools, and web search capabilities. @vmoens
  • Deprecated APIs removed - All deprecation warnings from v0.10 have been promoted to hard errors for v0.11. (#3369) @vmoens
  • New environment backends - Added Procgen environments support with a new ProcgenEnv wrapper. (#3331) @ParamThakkar123
  • Multi-env execution - GymEnv, BraxEnv, and DMControlEnv now support a num_envs/num_workers parameter to run multiple environments in a single call via ParallelEnv. (#3343, #3370, #3337) @ParamThakkar123

Installation

pip install torchrl==0.11.0

Breaking Changes

  • [v0.11] Remove deprecated features and replace warnings with errors (#3369) @vmoens

    • Removes deprecated KLRewardTransform from transforms/llm.py (use torchrl.envs.llm.KLRewardTransform)
    • Removes LogReward and Recorder classes from trainers (use LogScalar and LogValidationReward)
    • Removes unbatched_*_spec properties from VmasWrapper/VmasEnv (use full_*_spec_unbatched)
    • Deletes deprecated rlhf.py modules (data/rlhf.py, envs/transforms/rlhf.py, modules/models/rlhf.py)
    • Removes replay_buffer_chunk parameter from MultiCollector
    • Replaces minimum/maximum deprecation warnings with TypeError in Bounded spec
    • Replaces critic_coef/entropy_coef deprecation warnings with TypeError in PPO and A2C losses
  • [Major] Major refactoring of collectors (#3233) @vmoens

    • Splits the 5000+ line collectors.py into focused modules for single/multi sync/async collectors
    • Creates new _constants.py, _runner.py, base.py modules
    • Introduces cleaner weight synchronization scheme integration
    • Improves test coverage for multi-device and shared-device weight updates
    • Some internal APIs have changed; external API remains compatible

Dreamer World Model Improvements

These changes significantly improve Dreamer training performance, torch.compile compatibility, and usability. @vmoens

  • [Feature] Refactor Dreamer training with async collectors, profiling, and improved config (cc917ba)

    • Major overhaul of the Dreamer sota-implementation with async data collection
    • Adds profiling support for performance analysis
    • Improved configuration with better defaults and documentation
    • Updated README with detailed usage instructions
  • [Refactor] Dreamer implementation updates (3ab4b30)

    • Refactors Dreamer training script for better maintainability
    • Updates config.yaml with improved hyperparameters
    • Enhances dreamer_utils.py with additional helper functions
  • [Feature] Add noise argument and scan mode to RSSMRollout (8653b6e)

    • Adds noise argument to control stochastic sampling during rollout
    • Implements scan mode for efficient sequential processing
    • 109 lines added to model_based.py for improved RSSM flexibility
  • [Feature] Add explicit dimensions and device support to RSSM modules (f350fe0)

    • Adds explicit dimension handling for batch, time, and feature dims
    • Improves device placement for RSSM components
  • [Feature] Logging & RSSM fixes (d082979)

    • Fixes RSSM module behavior and adds logging improvements
    • Updates batched_envs.py and wandb logger
  • [Refactor] Use compile-aware helpers in Dreamer objectives (d8c3887)

    • Updates Dreamer objectives to use torch.compile-compatible helpers
    • Improves performance when using torch.compile
  • [BugFix] Optimize DreamerEnv to avoid CUDA sync in done checks (d57fdec)

    • Eliminates unnecessary CUDA synchronizations in done flag checking
    • Significant performance improvement for GPU-based Dreamer training
  • [BugFix] Fix ModelBasedEnvBase for torch.compile compatibility (7ff663d)

    • Makes ModelBasedEnvBase compatible with torch.compile
  • [Feature] Add allow_done_after_reset parameter to ModelBasedEnvBase (24ae042)

    • Adds flexibility for environments that may signal done immediately after reset
  • [BugFix] Final polish for Dreamer utils and Collector tests (5aefdd5)

    • Final cleanup and polish for Dreamer implementation

Weight Synchronization Schemes

New modular infrastructure for weight synchronization between training and inference workers. @vmoens

  • [Feature] Weight Synchronization Schemes - Core Infrastructure (e8f6fa5)

    • New torchrl.weight_update module with 2400+ lines of weight sync infrastructure
    • SharedMemWeightSyncScheme: Uses shared memory for fast intra-node sync
    • MultiProcessWeightSyncScheme: Uses multiprocessing queues for cross-process sync
    • Comprehensive documentation and examples in docs/source/reference/collectors.rst
  • [Feature] vLLM Weight Synchronization Schemes (d0c8b7e)

    • VllmNCCLWeightSyncScheme: NCCL-based weight sync for vLLM distributed inference
    • VllmDoubleBufferWeightSyncScheme: Double-buffered async weight updates
    • 1876 lines of vLLM-specific weight sync code
  • [Feature] Collectors - Weight Sync Scheme Integration (a7707ca)

    • Integrates weight sync schemes into collector infrastructure
    • Updates GRPO and expert-iteration implementations to use new schemes
    • Adds examples for multi-weight update patterns
  • [Refactor] Weight sync schemes refactor (ae0ae06)

    • Refines weight sync API and adds additional schemes
    • Improves test coverage with 342+ new test lines

LLM Training: DAPO & CISPO

New policy optimization algorithms for LLM fine-tuning. @vmoens

  • [Feature] DAPO (9d5c276)

    • Implements Direct Advantage Policy Optimization for LLM training
    • Adds DAPO-specific loss computation to torchrl/objectives/llm/grpo.py
  • [Feature] CISPO (ed0d8dc)

    • Implements Clipped Importance Sampling Policy Optimization
    • Alternative to PPO/GRPO with different clipping strategy
  • [Refactor] Refactor GRPO as a separate class (2bc3cb7)

    • Separates GRPO implementation for better modularity

Trainer Infrastructure

New trainer algorithms, configuration system, and utilities. @vmoens

  • [Trainers] SAC Trainer and algorithms (02d4bfd)

    • New SAC Trainer implementation with 786 lines of code
    • Complete sota-implementation in sota-implementations/sac_trainer/
    • Trainer configuration via YAML files
  • [Feature] Trainer Algorithms - Configuration System (6bc201a)

    • New configuration system in torchrl/trainers/algorithms/configs/
    • Configs for collectors, data, modules, objectives, transforms, weight sync schemes
    • Enables hydra-style configuration composition
  • [Feature] Trainer Infrastructure - Timing and Utilities (dc21523)

    • Adds timing utilities to trainer infrastructure
    • 263 lines of enhanced trainer functionality
  • [Feature] Async collection within trainers (5f1eb2c)

    • Enables asynchronous data collection during training
    • Improves training throughput
  • [Feature] PPO Trainer Updates (129f3d5)

    • Updates to PPO trainer with new features

Tool Services for LLM Agents

New infrastructure for tool-augmented LLM agents. @vmoens

  • [Feature] Tool services (9ca0e40)

    • New torchrl/services/ module for tool execution
    • Python executor service for safe code execution
    • MCP (Model Context Protocol) tool integration
    • Web search tool example
    • 609 lines of documentation in docs/source/reference/services.rst
    • Comprehensive test coverage in test/test_services.py
  • [Feature] Transform Module - ModuleTransform and Ray Service Refactor (7b85c71)

    • New ModuleTransform for applying nn.Modules as transforms
    • Refactored Ray service integration
    • Moves ray_service.py to torchrl/envs/transforms/

torch.compile Compatibility

Fixes to enable torch.compile with various TorchRL components. @vmoens

  • [BugFix] Fix value functions for torch.compile compatibility (3bdc7b1)

    • 295 lines of new tests in test/compile/test_value.py
    • Fixes value function implementations for compile compatibility
  • [BugFix] Fix TDLambdaEstimator for torch.compile compatibility (11e22ee)

    • Updates TDLambdaEstimator to w...
Read more

TorchRL 0.10.1: Fixes and named dimensions in composite specs

03 Nov 09:14

Choose a tag to compare

Release Notes - v0.10.1

This patch release includes bug fixes, type annotation improvements, and CI enhancements cherry-picked from main.

Bug Fixes

  • #3168 - @vmoens - [BugFix] AttributeError in accept_remote_rref_udf_invocation
    • Fixed AttributeError in RPC utilities when decorating classes with remote RRef invocation by handling None values in getattr calls

Features

  • #3174 - @vmoens - [Feature] Named dims in Composite

    • Added support for named dimensions in Composite specs, enabling better integration with PyTorch's named tensors
  • #3214 - @louisfaury - [Feature] Composite specs can create named tensors with 'zero' and 'rand'

    • Extended Composite specs to properly propagate names when creating tensors using zero() and rand() methods

Type Annotations & Documentation

  • @vmoens - [Typing] Edit wrongfully set str type annotations

    • Fixed incorrect string type annotations across 19 files
  • #3175 - @vmoens - [Versioning] Fix doc versioning

    • Fixed documentation versioning issues

CI/Build Improvements

  • #3200 - @vmoens - [CI] Use pip install

    • Updated CI workflows to use pip install across 41 files
  • @vmoens - [CI] Fix missing librhash0 in doc CI

    • Added missing librhash0 dependency in documentation CI
  • @vmoens - [CI] Fix benchmarks for LLMs

    • Fixed LLM benchmark CI configurations
  • #3222 - @vmoens - [CI] Upgrade doc python version

    • Upgraded Python version in documentation build workflows and added vLLM plugin entry point for FP32 overrides

TorchRL 0.10.0: async LLM inference

16 Sep 13:48

Choose a tag to compare

TorchRL 0.10.0 Release Notes

What's New in 0.10.0

TorchRL 0.10.0 introduces significant advancements in Large Language Model (LLM) support, new algorithms, enhanced environment integrations, and numerous performance improvements and bug fixes.

Major Features

LLM Support and RLHF

  • vLLM Integration Revamp: Complete overhaul of vLLM support with improved batching and performance (#3158) @vmoens
  • GRPO (Generalized Reinforcement Learning from Preference Optimization): New algorithm implementation with both sync and async variants (#2970, #2997, #3006) @vmoens
  • Expert Iteration and SFT: Implementation of expert iteration algorithms and supervised fine-tuning (#3017) @vmoens
  • PPOTrainer: New high-level trainer class for PPO training (#3117) @vmoens
  • LLM Tooling: Comprehensive tooling support for LLM environments and transformations (#2966) @vmoens
  • Remote LLM Wrappers: Support for remote LLM inference with improved batching (#3116) @vmoens
  • Common LLM Generation Interface: Unified kwargs for generation across vLLM and Transformers (#3107) @vmoens
  • LLM Transforms:
    • AddThinkingPrompt transform for reasoning prompts (#3027) @vmoens
    • MCPToolTransform for tool integration (#2993) @vmoens
    • PythonInterpreter transform for code execution (#2988) @vmoens
    • LLMMaskedCategorical for masked categorical distributions (#3041) @vmoens
  • Content Management: ContentBase system for structured content handling (#2985) @vmoens
  • History Tracking: New history system for conversation management (#2965) @vmoens

New Algorithms and Training

  • Async SAC: Asynchronous implementation of Soft Actor-Critic (#2946) @vmoens
  • Discrete Offline CQL: SOTA implementation for discrete action spaces (#3098) @Ibinarriaga
  • Multi-node Ray Support: Enhanced distributed training for GRPO (#3040) @albertbou92

Environment Support

  • NPU Support: Added NPU device support for SyncDataCollector (#3155) @lowdy1
  • IsaacLab Wrapper: Integration with IsaacLab simulation framework (#2937) @vmoens
  • Complete PettingZoo State Support: Enhanced multi-agent environment support (#2953) @JGuzzi
  • Minari Integration: Support for loading datasets from local Minari cache (#3068) @Ibinarriaga

Storage and Replay Buffers

  • Compressed Storage GPU: GPU acceleration for compressed replay buffers (#3062) @aorenstein68
  • Packing: New data packing functionality for efficient storage (#3060) @vmoens
  • Ray Replay Buffer: Enhanced distributed replay buffer support (#2949) @vmoens

🔧 Improvements and Enhancements

Performance Optimizations

  • Bounded Specs Memory: Single copy optimization for bounded specifications (#2977) @vmoens
  • Log-prob Computation: Avoid unnecessary log-prob calculations when retrieving distributions (#3081) @vmoens
  • LLM Wrapper Queuing: Performance fixes in LLM wrapper queuing (#3125) @vmoens
  • vmap Deactivation: Selective vmap deactivation in objectives for better performance (#2957) @vmoens

API Improvements

  • Public SAC Methods: Exposed public methods for SAC algorithm (#3085) @vmoens
  • Composite Entropy: Fixed entropy computation for nested keys (#3101) @juandelos
  • Multi-head Entropy: Per-head entropy coefficients for PPO (#2972) @Felixs
  • ClippedPPOLoss: Support for composite value networks (#3031) @louisfaury
  • LineariseRewards: Support for negative weights (#3064) @YoannPoupart
  • GAE Typing: Improved typing with optional value networks (#3029) @louisfaury
  • Explained Variance: Optional explained variance logging (#3010) @OswaldZink
  • Frame Control: Worker-level control over frames_per_batch (#3020) @alexghh

Developer Experience

  • Colored Logger: Enhanced logging with colored output (#2967) @vmoens
  • Better Error Handling: Improved error catching in env.rollout and rb.add (#3102) @vmoens
  • Warning Management: Better warning control for various components (#3099, #3115) @vmoens
  • Faster Tests: Optimized test suite performance (#3162) @vmoens

Bug Fixes

Core Functionality

Environment and Wrapper Fixes

  • TransformedEnv: Fixed in-place modification of specs (#3076) @vmoens
  • Parallel Environments: Fixed partial and nested done states (#2959) @vmoens
  • Gym Actions: Fixed single action passing when action key is not "action" (#2942) @vmoens
  • Brax Memory: Fixed memory leak in Brax environments (#3052) @vmoens
  • Atari Patching: Fixed patching for NonTensorData observations (#3091) @marcosGR

Collector and Replay Buffer Fixes

  • LLMCollector: Fixed trajectory collection when multiple trajectories complete (#3018) @albertbou92
  • Postprocessing: Consistent postprocessing when using replay buffers in collectors (#3144) @vmoens
  • Weight Updates: Fixed original weights retrieval in collectors (#2951) @vmoens
  • Transform Handling: Fixed transform application and metadata preservation (#3047, #3050) @vmoens

Compatibility and Infrastructure

  • PyTorch 2.1.1: Fixed compatibility issues (#3157) @vmoens
  • NPU Attribute: Fixed missing NPU attribute (#3159) @vmoens
  • CUDA Graph: Fixed update_policy_weights_ with CUDA graphs (#3003) @vmoens
  • Stream Capturing: Robust CUDA stream capturing calls (#2950) @vmoens

Documentation and Tutorials

  • DQN with RNN Tutorial: Upgraded tutorial with latest best practices (#3152) @vmoens
  • LLM API Documentation: Comprehensive documentation for LLM environments and transforms (#2991) @vmoens
  • Multi-head Entropy: Better documentation for multi-head entropy usage (#3109) @vmoens
  • LSTM Module: Fixed import examples in documentation (#3138) @arvindcr4
  • A2C Documentation: Updated AcceptedKeys documentation (#2987) @simeet-n
  • History API: Added missing docstrings for History functionality (#3083) @vmoens
  • Multi-agent PPO: Fixed tutorial issues (#2940) @matteobettini
  • WeightUpdater: Updated documentation after renaming (#3007) @albertbou92

Infrastructure and CI

  • Pre-commit Updates: Updated formatting and linting tools (#3108) @vmoens
  • Benchmark CI: Fixed benchmark runs and added missing dependencies (#3092, #3163) @vmoens
  • Windows CI: Fixed Windows continuous integration (#3028) @vmoens
  • Old Dependencies: Fixed CI for older dependency versions (#3165) @vmoens
  • C++ Linting: Fixed C++ code linting issues (#3129) @vmoens
  • Build System: Improved pyproject.toml usage and versioning (#3089, #3166) @vmoens

🏆 Contributors

Special thanks to all contributors who made this release possible:

  • @albertbou92 (Albert Bou) - GRPO multi-node support and LLM improvements
  • @Ibinarriaga - CQL offline algorithm and Minari integration
  • @aorenstein68 (Adrian Orenstein) - Compressed storage GPU support
  • @louisfaury (Louis Faury) - Categorical spec and PPO improvements
  • @LucaCarminati (Luca Carminati) - Binary tensor fixes
  • @JGuzzi (Jérôme Guzzi) - PettingZoo state support
  • @lowdy1 - NPU device support
  • @Felixs (Felix Sittenauer) - Multi-head entropy coefficients
  • @YoannPoupart (Yoann Poupart) - LineariseRewards improvements
  • @OswaldZink (Oswald Zink) - Explained variance logging
  • @alexghh (Alexandre Ghelfi) - Frame control improvements
  • @marcosGR (Marcos Galletero Romero) - Atari patching fixes
  • @matteobettini (Matteo Bettini) - Tutorial fixes
  • @simeet-n (Simeet Nayan) - Documentation improvements
  • @arvindcr4 - Documentation fixes
  • @felixy12 (Felix Yu) - State dict reference fixes
  • @SendhilPanchadsaram (Sendhil Panchadsaram) - Documentation typo fixes
  • @abhishekunique (Abhishek) - WandB logger and value estimation improvements
  • @骑马小猫 - DQN module typo fix
  • @ZainRizvi (Zain Rizvi) - CI improvements and meta-pytorch migration
  • @mikayla-gawarecki (Mikayla Gawarecki) - Usage tracking and ConditionalPolicySwitch

🔗 Compatibility

  • PyTorch: Compatible with PyTorch 2.1.1+ -- recommended >=2.8.0,<2.9.0 for full compatibility
  • TensorDict: Updated to work with TensorDict 0.10+
  • Python: Supports Python 3.9+

📦 Installation

pip install torchrl==0.10.0

For the latest features:

pip install git+https://github.com/pytorch/rl.git@release/0.10.0

v0.9.2: Bug fixes and perf improvements

17 Jul 17:11

Choose a tag to compare

TorchRL 0.9.2 Release Notes

This release focuses on bug fixes, performance improvements, and code quality enhancements.

🚀 New Features

  • LineariseRewards: Now supports negative weights for more flexible reward shaping (#3064)

🐛 Bug Fixes

  • Fixed policy reference handling in state dictionaries (#3043)
  • Improved unbatched data handling in LLM wrappers (#3070)
  • Fixed cross-entropy log-probability computation for batched inputs (#3080)
  • Fixed Binary clone() operations (#3077)
  • Fixed in-place spec modifications in TransformedEnv (#3076)

⚡ Performance Improvements

  • Optimized distribution sampling by avoiding unnecessary log-probability computations (#3081)

🔧 Code Quality

  • Standardized coefficient naming in A2C and PPO algorithms (#3079)

📦 Installation

pip install torchrl==0.9.2

Thanks to all contributors: @felixy12, @Xmaster6y, @louisfaury and @LCarmi

v0.9.1: fix for history-based vLLM and Transformers wrappers

11 Jul 15:48

Choose a tag to compare

Fixes an critical issue with vLLMWrapper and TransformersWrapper, where a stack of History objects is resent to stack, resulting in a bug.