Releases: pytorch/rl
TorchRL v0.13.2
TorchRL 0.13.2 is a patch release focused on regression fixes, release reliability, and CI stability. It intentionally avoids new feature backports.
Highlights
- Fixed Isaac Lab reset regressions, including Direct-env autoreset opt-in behavior, observation-key normalization, and
reset_toedge cases. - Fixed
SliceSamplerandEnvBase.stepregressions affecting compile compatibility and batch-locked environments. - Fixed
MultiSyncCollectorworker-output gathering and improved preemption throughput without busy-waiting. - Made the vLLM FP32 override opt-in so importing TorchRL no longer changes host vLLM behavior.
- Restored and hardened nightly/release publishing checks, including PyPI wheel filtering and release-build symbol stripping.
Backported fixes and maintenance
- #3869 Fix IsaacLab reset regressions.
- #3868 Make the vLLM FP32 plugin opt-in.
- #3861 Fix FP32 override registration in the vLLM plugin.
- #3858 Fix
_skip_tensordicthandling inEnvBase.step. 4679d0aFixSliceSamplercompile compatibility.- #3856 Fix flaky Pendulum spec and collector preemption-ordering tests.
- #3843, #3841 Fix and optimize
MultiSyncCollectorpreemption. - #3846, #3845, #3844 Harden nightly version checks and uploads.
- #3859, #3855, #3842, #3825,
12093aeImprove release, benchmark, auto-tag, and docs CI reliability. - #3852 Fix tutorial code links.
- #3849, #3848, #3847 Refresh SOTA dependency pins.
No new public API exports are intended in this patch release.
TorchRL v0.13.1
TorchRL v0.13.1 is a maintenance release for the 0.13 line. It carries post-0.13.0 RNN backend fixes and performance improvements, compile-friendliness fixes, SOTA example dependency refreshes, and documentation improvements.
Merged PR inventory
Recurrent modules and RNN backends
- #3818 improves Triton RNN recurrent-matmul robustness with large-hidden tiling, 64-bit offsets, and faster autotune behavior.
- #3752 adds recompute backward support and narrow RNN canonicalization to reduce learner memory pressure when multiple recurrent modules share a batch.
Compile and conversion stability
- #3819 avoids a
to_moduleFutureWarning graph break undertorch.compilewhile preserving the previous state-preserving conversion behavior.
SOTA implementation dependency refreshes
- #3708 updates the GRPO SOTA implementation to
vllm0.20.0. - #3601 updates the expert-iteration SOTA implementation to
transformers5.0.0rc3.
Documentation
- #3821 fixes non-resolving API cross-references across docs and tutorials.
- #3822 migrates the docs to
pytorch_sphinx_theme2and fixes tutorial Colab, Notebook, and GitHub links. - #3745 adds a memory-efficient RL training tutorial and cross-references for layout and recurrent-training guidance.
Newly exported public symbols
Utilities
torchrl.cuda_memory_profile(code, docs) — context manager/decorator for scoped CUDA memory profiling.torchrl.cuda_memory_stats(code, docs) — helper for reading current and peak CUDA allocation/reservation statistics.torchrl.reset_cuda_peak_stats(code, docs) — helper for resetting CUDA peak memory counters.
Modules
torchrl.modules.tensordict_module.canonicalize_rnn_subset(also re-exported astorchrl.modules.canonicalize_rnn_subset; specific export, package export, docs) — canonicalizes only the recurrent keys used by selected RNN modules.
Highlights
- More robust Triton RNN recurrent matrix multiplication for large hidden sizes and backend autotuning.
- Lower-memory recurrent learner updates through recompute backward and subset canonicalization.
- Cleaner
torch.compilebehavior for state-preserving module conversion. - Updated docs theme and repaired generated API and tutorial links.
- New memory-efficient RL training tutorial.
- Refreshed dependency pins for GRPO and expert-iteration SOTA examples.
Installation
pip install torchrl==0.13.1For CUDA wheel variants, follow the install index documented in the TorchRL README for the desired CUDA runtime.
Full changelog
TorchRL v0.13.0
TorchRL v0.13.0
TorchRL 0.13.0 is a broad release focused on recurrent RL throughput, MuJoCo-native environments, macro-control workflows, multi-agent training utilities, and release-aligned cleanup of previously warned deprecations. It also introduces optional Linux CUDA wheels for users who want CUDA-based prioritized replay-buffer kernels, while keeping the standard PyPI wheel as the default for CPU prioritized replay buffers and users who do not need prioritized replay. The release refreshes compatibility with optional and older dependency stacks used by TorchRL's wider environment coverage.
Merged PRs included in v0.13.0
This release includes the following merged PRs in the v0.12.0..v0.13.0 first-parent range:
Release, documentation, and CI
- #3817 — Documented optional CUDA TorchRL wheels for CUDA-based prioritized replay-buffer kernels while keeping the standard PyPI wheel as the default install.
- #3814 — Fixed release-CI dependency checks, including stable TensorDict selection, TorchCodec CPU wheels, and mixed-device spec validation.
- #3813 — Refreshed the README for the 0.13 release.
- #3805 — Enacted the v0.13 deprecations and behavior changes.
- #3796 — Added collector internals documentation.
- #3803 — Bumped TorchRL version metadata to 0.13.0.
- #3787 — Added uv local development setup.
- #3747 — Split large test files into per-concept files.
- #3741 — Added automatic insertion of new release versions into gh-pages versions.html.
- #3739 — Fixed olddeps / opt-deps / gym smoke tests broken by #3738 and #3704.
- #3710 — Bumped the olddeps TensorDict default to 0.12.
- #3705 — Fixed benchmark workflows.
- #3706 — Moved stdlib and TorchRL local imports in tests to module top.
- #3696 — Added NestedKey and Literal contribution guidance.
- #3686 — Added repository contribution rules for AI agents.
Recurrent RL, value estimation, and world models
- #3816 — Added RNN reset rollout benchmarks for TorchRL
LSTMModule/GRUModule, covering intermediate resets across cuDNN, scan, and Triton backends with eager and compiled runs. - #3815 — Kept GRU scan split sizes concrete for compiled recurrent rollouts while preserving old/optional dependency compatibility.
- #3780 — Added a dynamic value-estimator registry across loss modules.
- #3807 — Gated scan RNN backward support for compatible environments.
- #3792 — Added the recurrent state lifecycle guide.
- #3793 — Added recurrent integration coverage.
- #3797 — Added chunked TensorDict support for value estimators.
- #3785 — Fixed the Triton RNN kernel.
- #3784 — Simplified the shifted value-estimator budget.
- #3782 — Added the compact-drop shifted value backend.
- #3744 — Sanitized NaN next observations in value-estimator forwards.
- #3738 — Added the Triton backend for GRU/LSTM with intermediate resets.
- #3712 — Fixed LSTMModule padding.
- #3707 — Stabilized the RSSMPosteriorV3 gradient test.
- #3695 — Improved sequence-RL composability.
- #3621 — Implemented DreamerV3 world-model objectives and components.
Environments, transforms, and examples
- #3781 — Surfaced Isaac Lab per-index reset and reset_to via IsaacLabWrapper.
- #3811 — Fixed MuJoCo macro shapes and Gym Atari setup.
- #3779 — Fixed a hidden multiprocessing import in the coding PPO tutorial.
- #3806 — Added the MuJoCo macro tutorial environment.
- #3802 — Added satellite MuJoCo SAC examples.
- #3800 — Added IsaacLab headless rendered eval.
- #3700 — Added MuJoCo custom environments with selectable physics backends.
- #3801 — Fixed unused frame_skip in PPO tutorial.
- #3777 — Added the NextObservationDelta environment transform.
- #3766 — Added the FlattenAction transform.
- #3765 — Added the ActionScaling transform.
- #3727 — Fixed KeyError in PettingZoo action mask with ParallelEnv and done_on_any=False.
- #3743 — Added the NextStateReconstructor replay-buffer transform.
- #3742 — Added compact_obs support to DataCollector.
- #3698 — Fixed GymEnv reward and action shapes across num-env configurations.
- #3689 — Added Safety-Gymnasium environment wrappers.
- #3682 — Fixed PettingZoo state handling and added an encoding regression test.
- #3676 — Added the ExpandAs transform.
Collectors, replay buffers, and performance
- #3810 — Made collector weight synchronization idempotent.
- #3734 — Added HERReplayBuffer and HindsightStrategy to torchrl.data.
- #3749 — Dispatched RandomSampler and SliceSampler to without-replacement variants via replacement=False.
- #3729 — Auto-created inner SharedMem schemes for Ray/RPC and policy_factory.
- #3728 — Made per-worker SharedMem schemes opt-in for policy_factory tests.
- #3714 — Added async prioritized replay-buffer writes.
- #3680 — Gated the profiling decorator on TORCHRL_PROFILING.
- #3685 — Improved CUDA prioritized replay-buffer ergonomics.
- #3672 — Added data-collector hooks.
- #3677 — Added CUDA support for prioritized replay sampling, available through the optional CUDA wheel builds.
Objectives, trainers, and multi-agent learning
- #3773 — Added CrossGroupCritic.
- #3748 — Added MAPPOLoss, IPPOLoss, MultiAgentGAE, and ValueNorm.
- #3750 — Fixed the cross-entropy reduction parameter in discrete objectives.
- #3694 — Added QMix, VDN, and IQL support to the DQN trainer.
- #3699 — Improved Hydra config parity for environments, losses, and loggers.
- #3692 — Added an early-stopping trainer hook.
- #3693 — Audited TrainerConfig/Trainer parity and added auto_log_optim_steps plumbing.
- #3683 — Made DDPG, PPO, and SAC trainers multi-agent-friendly.
- #3691 — Added trainer hook configs.
- #3639 — Added ACTModel and ACTLoss for robot learning.
- #3667 — Added behavior-cloning loss.
- #3679 — Fixed PPOTrainer gamma/lambda defaults, removed dead code, and removed a wildcard import.
Compatibility, bug fixes, and cleanup
- #3809 — Fixed GRPO runtime issues for vLLM and SGLang backends.
- #3812 — Fixed Robohive, Gym, PettingZoo, and setup-test CI failures.
- #3808 — Fixed LLM assistant action masks.
- #3799 — Fixed old- and optional-dependency workflows.
- #3709 — Updated the REINFORCE value-net test for new torch autograd error wording.
- #3704 — Forwarded generator kwargs through ProbabilisticActor.
- #3688 — Added setup and shutdown hook points.
- #3684 — Split environment transforms into per-category modules.
Highlights
Faster recurre...
TorchRL v0.12.0
TorchRL v0.12.0 Release Notes
Highlights
-
New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new
DDPMModulediffusion actor andDiffusionBCLossare included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens). -
Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new
AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The newTrajectoryBatcherandAsyncTrajectoryBatcherassemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replacemp.Eventfor lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead instep_and_maybe_reset(@vmoens). -
Inference backends. This release adds full SGLang integration alongside vLLM, with an
SGLangWrapperpolicy module, anAsyncSGLangserver-based inference path, NCCL weight synchronization, and GRPO support (@vmoens). -
Replay buffer.
StoreStorageis a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens). -
Evaluation. A new
Evaluatorclass provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync viaWeightSyncScheme, multi-model support, and aRayEvalWorkerfor distributed evaluation (@vmoens). -
Environments and platform support. A new
GenesisEnvwrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting inParallelEnv,SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).
Installation
pip install torchrl==0.12.0Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.
Breaking Changes
- Remove v0.12 deprecated APIs (#3670) @vmoens
- The
local_init_rbparameter has been removed fromCollectorandMultiCollector. Storage-level initialization is now the only behavior. TransformedEnv(env=...)now raisesTypeError. UseTransformedEnv(base_env=...)instead.
- The
New Features
Auto-batching Inference Server
A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.
- Core server and transport protocol (#3492)
- Threading transport (#3493)
- Multiprocessing transport (#3494)
- Ray transport (#3495)
- Monarch transport (#3496)
- Weight sync integration (#3497)
AsyncBatchedCollector
A new collector that combines async environments with the auto-batching inference server for maximum throughput.
- Async envs + auto-batching inference (#3498)
- Coordinator loop and direct submission mode (#3499)
- Backend params and performance optimizations (#3511)
Trajectory Batcher
TrajectoryBatcherfor assembling trajectories from streaming transitions (#3584) @theap06AsyncTrajectoryBatcherfor asynchronous trajectory assembly (#3592) @theap06
SGLang Backend
Full SGLang support for LLM inference, mirroring the existing vLLM integration:
- Base infrastructure (#3428)
AsyncSGLangserver-based inference service (#3429)SGLangWrapperpolicy module (#3430)- NCCL weight synchronization (#3431)
- Module structure integration (#3432)
- SGLang backend support in GRPO
Diffusion Policies
DDPMModulediffusion actor for denoising diffusion probabilistic models (#3596) @theap06DiffusionBCLossfor diffusion-based behavioral cloning (#3604) @theap06
Evaluator
Evaluatorclass for sync/async evaluation (#3594)- Process backend, lazy init, and pending property (#3611)
- Collector-based stepping backend (#3624)
- Enable loggers to run as Ray actors (#3623)
- Weight sync via
WeightSyncScheme+ multi-model support (#3627) - Isaac Lab
Evaluatortests +init_fnplumbing for process backend (#3663) RayEvalWorkerfor distributed async evaluation (#3474)- Named actors and
from_nameforRayEvalWorker(#3488)
Async PPO
- Async PPO infrastructure for overlapping collection and optimization (#3661)
Config-based Trainers
New trainers with integrated configuration system:
- DQN Trainer (#3526)
- DDPG Trainer (#3527)
- IQL Trainer (#3528)
- CQL Trainer (#3529)
- TD3 Trainer (#3557) @bsprenger
- Hook point to log average optimization losses in trainers (#3666)
Replay Buffer
StoreStoragefor Redis/Dragonfly-backed replay buffers (#3516)set_at_,set_,update_methods onReplayBuffer(#3590) @jashshah999- Support
trajs_per_batchwithreplay_bufferon multi-process and distributed collectors (#3618)
LLM / GRPO
- Token-in, token-out LLM wrapper mode (#3407)
- GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
- Namespace GRPO wandb metrics for auto-grouping (#3585)
- Remove placement-group xfails and fix vLLM tokenizer compat (#3586)
Environments
GenesisEnv: wrapper for the Genesis physics simulator (#3536) @ParamThakkar123FinancialRegimeEnv: a vectorized financial environment (#3384) @aneesh223num_workersparameter forHabitatEnv(#3383) @ParamThakkar123- Dreamer: support pre-vectorized environments (#3483)
- Dreamer: add IsaacLab environment factory (#3484)
Transforms
- Inverse for
VecNormandVecNormV2transforms (#3416) @ParamThakkar123 prevent_leaking_rngutility (#3401) @ParamThakkar123
Logging
Specs
index_selectsupport forTensorSpec(#3406) @ParamThakkar123strict_shapeparameter forQValueModuleaction shape enforcement (#3593) @Lidang-Jiang
Algorithms
Collectors
- Lazy-init
RandomPolicyaction_specfrom env in collectors (#3664)
Other
__getattr__in_dispatch_caller_parallelfor transparent attribute access (#3389) @ParamThakkar123scalar_output_modefor loss modules withreduction='none'(#3426)ObsDecoder:out_channelsparameter for grayscale decoding (#3472)- Ergonomic scalar assignment for loss buffers (#3612)
- New
memmapvalue for theCKPT_BACKENDenvironment variable ([#3619](https:...
TorchRL v0.11.1
Highlights
This patch release includes several important bug fixes and performance improvements:
- Fixed
Composite.encode()to correctly set the batch size of the output TensorDict - Fixed
StepCounterto properly track nested truncated and done states in multi-agent environments - Fixed shared memory weight updater to work correctly with collectors using multiple policies
- Fixed
_repr_html_dispatch in parallel environments that was causing doc CI failures - Added
scalar_output_modeto loss modules for proper handling ofreduction='none' - Fixed
torch.compileconfiguration for Dreamer - Performance: GPU Image Transforms for Dreamer (~5.5x faster sampling)
- Performance: SliceSampler GPU acceleration for faster trajectory computation
- Performance: Always enable prefetch for replay buffer
Breaking Changes
No breaking changes in this release.
Bug Fixes
-
Fixed batch size in
Composite.encode: TheComposite.encode()method now correctly sets thebatch_sizeof the outputTensorDictto match the shape of the tensor spec, rather than returning an empty batch size. (#3411) - @tobiabirPreviously, calling
Composite.encode(raw_vals)would return a TensorDict withbatch_size=torch.Size([])regardless of the spec's shape. This is now fixed to return the correct batch size matching the spec shape. -
Fixed
StepCounternested done/truncated tracking in multi-agent environments:StepCounternow properly updates nested truncated and done keys for multi-agent environments. (#3405) - @vmoensWhen using
StepCounterwith multi-agent environments (e.g., PettingZoo), the transform now correctly propagates truncated/done signals to agent-specific keys (e.g.,("agent", "truncated")) in addition to the root-level keys. -
Fixed shared memory weight updater with multiple policies: The shared memory weight updater now correctly handles collectors that use multiple policies. (#3442) - @vmoens
-
Fixed
_repr_html_dispatch in parallel environments: Parallel environments no longer incorrectly dispatch private/special attribute access (like_repr_html_) to worker processes. (#3441) - @vmoens -
Added
scalar_output_modeto loss modules: Loss modules (SAC, IQL, CQL, CrossQ, REDQ, DecisionTransformer) now supportscalar_output_modeparameter for proper handling ofreduction='none'. (#3426) - @vmoens -
Fixed torch.compile configuration for Dreamer: Fixed compilation settings for Dreamer world model training. - @vmoens
Performance Improvements
- GPU Image Transforms for Dreamer: ~5.5x faster sampling with GPU-accelerated image transforms. - @vmoens
- SliceSampler GPU acceleration: Faster trajectory computation using GPU. - @vmoens
- Always enable prefetch for replay buffer: Improved data loading performance. - @vmoens
Cleanup
- Removed pin_memory from replay buffer: Simplified replay buffer configuration. - @vmoens
Internal / CI Improvements
- Added PyTorch version check instructions to release prompt (#3443) - @vmoens
- Added tutorials CI workflow for testing sphinx tutorials (#3441) - @vmoens
- Upgraded
meshgridusage to address PyTorch deprecation warning (#3412) - @vmoens - Added flaky test tracking system for improved CI reliability (#3408) - @vmoens
- Added file-based auto-labeling for PR components (#3402) - @vmoens
- Improved LLM prompt for release workflow (#3399) - @vmoens
Contributors
Thanks to all contributors to this release:
Installation
```bash
pip install torchrl==0.11.1
```
Or with conda:
```bash
conda install -c pytorch torchrl=0.11.1
```
TorchRL v0.11.0
TorchRL v0.11.0 Release Notes
Highlights
- Dreamer overhaul - Comprehensive improvements to Dreamer world model training: async collectors with profiling, RSSM fixes (scan mode, noise injection, explicit dimensions), torch.compile compatibility for value functions and TDLambda estimator, optimized DreamerEnv to avoid CUDA syncs, and updated sota-implementation with better configs. @vmoens
- Weight synchronization schemes - New modular weight sync infrastructure (
torchrl.weight_update) with SharedMem, MultiProcess, and vLLM-specific (NCCL, double-buffer) schemes. Collectors now integrate seamlessly with weight sync schemes for distributed training. @vmoens - Major collector refactor - The collector codebase has been completely restructured. The monolithic
collectors.pyis now split into focused modules (_single.py,_multi_base.py,_multi_sync.py,_multi_async.py,_runner.py,base.py), with cleaner separation of concerns. (#3233) @vmoens - LLM objectives: DAPO & CISPO - New DAPO (Direct Advantage Policy Optimization) and CISPO (Clipped Importance Sampling Policy Optimization) algorithms for LLM training. @vmoens
- Trainer infrastructure - New SAC Trainer, configuration system for algorithms, timing utilities, and async collection support within trainers. @vmoens
- Tool services - New tool service infrastructure for LLM agents with Python executor, MCP tools, and web search capabilities. @vmoens
- Deprecated APIs removed - All deprecation warnings from v0.10 have been promoted to hard errors for v0.11. (#3369) @vmoens
- New environment backends - Added Procgen environments support with a new
ProcgenEnvwrapper. (#3331) @ParamThakkar123 - Multi-env execution - GymEnv, BraxEnv, and DMControlEnv now support a
num_envs/num_workersparameter to run multiple environments in a single call viaParallelEnv. (#3343, #3370, #3337) @ParamThakkar123
Installation
pip install torchrl==0.11.0Breaking Changes
-
[v0.11] Remove deprecated features and replace warnings with errors (#3369) @vmoens
- Removes deprecated
KLRewardTransformfromtransforms/llm.py(usetorchrl.envs.llm.KLRewardTransform) - Removes
LogRewardandRecorderclasses from trainers (useLogScalarandLogValidationReward) - Removes
unbatched_*_specproperties from VmasWrapper/VmasEnv (usefull_*_spec_unbatched) - Deletes deprecated
rlhf.pymodules (data/rlhf.py,envs/transforms/rlhf.py,modules/models/rlhf.py) - Removes
replay_buffer_chunkparameter from MultiCollector - Replaces
minimum/maximumdeprecation warnings withTypeErrorin Bounded spec - Replaces
critic_coef/entropy_coefdeprecation warnings withTypeErrorin PPO and A2C losses
- Removes deprecated
-
[Major] Major refactoring of collectors (#3233) @vmoens
- Splits the 5000+ line
collectors.pyinto focused modules for single/multi sync/async collectors - Creates new
_constants.py,_runner.py,base.pymodules - Introduces cleaner weight synchronization scheme integration
- Improves test coverage for multi-device and shared-device weight updates
- Some internal APIs have changed; external API remains compatible
- Splits the 5000+ line
Dreamer World Model Improvements
These changes significantly improve Dreamer training performance, torch.compile compatibility, and usability. @vmoens
-
[Feature] Refactor Dreamer training with async collectors, profiling, and improved config (cc917ba)
- Major overhaul of the Dreamer sota-implementation with async data collection
- Adds profiling support for performance analysis
- Improved configuration with better defaults and documentation
- Updated README with detailed usage instructions
-
[Refactor] Dreamer implementation updates (3ab4b30)
- Refactors Dreamer training script for better maintainability
- Updates config.yaml with improved hyperparameters
- Enhances dreamer_utils.py with additional helper functions
-
[Feature] Add noise argument and scan mode to RSSMRollout (8653b6e)
- Adds
noiseargument to control stochastic sampling during rollout - Implements scan mode for efficient sequential processing
- 109 lines added to model_based.py for improved RSSM flexibility
- Adds
-
[Feature] Add explicit dimensions and device support to RSSM modules (f350fe0)
- Adds explicit dimension handling for batch, time, and feature dims
- Improves device placement for RSSM components
-
[Feature] Logging & RSSM fixes (d082979)
- Fixes RSSM module behavior and adds logging improvements
- Updates batched_envs.py and wandb logger
-
[Refactor] Use compile-aware helpers in Dreamer objectives (d8c3887)
- Updates Dreamer objectives to use torch.compile-compatible helpers
- Improves performance when using torch.compile
-
[BugFix] Optimize DreamerEnv to avoid CUDA sync in done checks (d57fdec)
- Eliminates unnecessary CUDA synchronizations in done flag checking
- Significant performance improvement for GPU-based Dreamer training
-
[BugFix] Fix ModelBasedEnvBase for torch.compile compatibility (7ff663d)
- Makes ModelBasedEnvBase compatible with torch.compile
-
[Feature] Add allow_done_after_reset parameter to ModelBasedEnvBase (24ae042)
- Adds flexibility for environments that may signal done immediately after reset
-
[BugFix] Final polish for Dreamer utils and Collector tests (5aefdd5)
- Final cleanup and polish for Dreamer implementation
Weight Synchronization Schemes
New modular infrastructure for weight synchronization between training and inference workers. @vmoens
-
[Feature] Weight Synchronization Schemes - Core Infrastructure (e8f6fa5)
- New
torchrl.weight_updatemodule with 2400+ lines of weight sync infrastructure SharedMemWeightSyncScheme: Uses shared memory for fast intra-node syncMultiProcessWeightSyncScheme: Uses multiprocessing queues for cross-process sync- Comprehensive documentation and examples in
docs/source/reference/collectors.rst
- New
-
[Feature] vLLM Weight Synchronization Schemes (d0c8b7e)
VllmNCCLWeightSyncScheme: NCCL-based weight sync for vLLM distributed inferenceVllmDoubleBufferWeightSyncScheme: Double-buffered async weight updates- 1876 lines of vLLM-specific weight sync code
-
[Feature] Collectors - Weight Sync Scheme Integration (a7707ca)
- Integrates weight sync schemes into collector infrastructure
- Updates GRPO and expert-iteration implementations to use new schemes
- Adds examples for multi-weight update patterns
-
[Refactor] Weight sync schemes refactor (ae0ae06)
- Refines weight sync API and adds additional schemes
- Improves test coverage with 342+ new test lines
LLM Training: DAPO & CISPO
New policy optimization algorithms for LLM fine-tuning. @vmoens
-
[Feature] DAPO (9d5c276)
- Implements Direct Advantage Policy Optimization for LLM training
- Adds DAPO-specific loss computation to
torchrl/objectives/llm/grpo.py
-
[Feature] CISPO (ed0d8dc)
- Implements Clipped Importance Sampling Policy Optimization
- Alternative to PPO/GRPO with different clipping strategy
-
[Refactor] Refactor GRPO as a separate class (2bc3cb7)
- Separates GRPO implementation for better modularity
Trainer Infrastructure
New trainer algorithms, configuration system, and utilities. @vmoens
-
[Trainers] SAC Trainer and algorithms (02d4bfd)
- New SAC Trainer implementation with 786 lines of code
- Complete sota-implementation in
sota-implementations/sac_trainer/ - Trainer configuration via YAML files
-
[Feature] Trainer Algorithms - Configuration System (6bc201a)
- New configuration system in
torchrl/trainers/algorithms/configs/ - Configs for collectors, data, modules, objectives, transforms, weight sync schemes
- Enables hydra-style configuration composition
- New configuration system in
-
[Feature] Trainer Infrastructure - Timing and Utilities (dc21523)
- Adds timing utilities to trainer infrastructure
- 263 lines of enhanced trainer functionality
-
[Feature] Async collection within trainers (5f1eb2c)
- Enables asynchronous data collection during training
- Improves training throughput
-
[Feature] PPO Trainer Updates (129f3d5)
- Updates to PPO trainer with new features
Tool Services for LLM Agents
New infrastructure for tool-augmented LLM agents. @vmoens
-
[Feature] Tool services (9ca0e40)
- New
torchrl/services/module for tool execution - Python executor service for safe code execution
- MCP (Model Context Protocol) tool integration
- Web search tool example
- 609 lines of documentation in
docs/source/reference/services.rst - Comprehensive test coverage in
test/test_services.py
- New
-
[Feature] Transform Module - ModuleTransform and Ray Service Refactor (7b85c71)
- New
ModuleTransformfor applying nn.Modules as transforms - Refactored Ray service integration
- Moves
ray_service.pytotorchrl/envs/transforms/
- New
torch.compile Compatibility
Fixes to enable torch.compile with various TorchRL components. @vmoens
TorchRL 0.10.1: Fixes and named dimensions in composite specs
Release Notes - v0.10.1
This patch release includes bug fixes, type annotation improvements, and CI enhancements cherry-picked from main.
Bug Fixes
- #3168 - @vmoens - [BugFix] AttributeError in accept_remote_rref_udf_invocation
- Fixed AttributeError in RPC utilities when decorating classes with remote RRef invocation by handling None values in getattr calls
Features
-
#3174 - @vmoens - [Feature] Named dims in Composite
- Added support for named dimensions in Composite specs, enabling better integration with PyTorch's named tensors
-
#3214 - @louisfaury - [Feature] Composite specs can create named tensors with 'zero' and 'rand'
- Extended Composite specs to properly propagate names when creating tensors using
zero()andrand()methods
- Extended Composite specs to properly propagate names when creating tensors using
Type Annotations & Documentation
-
@vmoens - [Typing] Edit wrongfully set str type annotations
- Fixed incorrect string type annotations across 19 files
-
#3175 - @vmoens - [Versioning] Fix doc versioning
- Fixed documentation versioning issues
CI/Build Improvements
-
#3200 - @vmoens - [CI] Use pip install
- Updated CI workflows to use pip install across 41 files
-
@vmoens - [CI] Fix missing librhash0 in doc CI
- Added missing librhash0 dependency in documentation CI
-
@vmoens - [CI] Fix benchmarks for LLMs
- Fixed LLM benchmark CI configurations
-
#3222 - @vmoens - [CI] Upgrade doc python version
- Upgraded Python version in documentation build workflows and added vLLM plugin entry point for FP32 overrides
TorchRL 0.10.0: async LLM inference
TorchRL 0.10.0 Release Notes
What's New in 0.10.0
TorchRL 0.10.0 introduces significant advancements in Large Language Model (LLM) support, new algorithms, enhanced environment integrations, and numerous performance improvements and bug fixes.
Major Features
LLM Support and RLHF
- vLLM Integration Revamp: Complete overhaul of vLLM support with improved batching and performance (#3158) @vmoens
- GRPO (Generalized Reinforcement Learning from Preference Optimization): New algorithm implementation with both sync and async variants (#2970, #2997, #3006) @vmoens
- Expert Iteration and SFT: Implementation of expert iteration algorithms and supervised fine-tuning (#3017) @vmoens
- PPOTrainer: New high-level trainer class for PPO training (#3117) @vmoens
- LLM Tooling: Comprehensive tooling support for LLM environments and transformations (#2966) @vmoens
- Remote LLM Wrappers: Support for remote LLM inference with improved batching (#3116) @vmoens
- Common LLM Generation Interface: Unified kwargs for generation across vLLM and Transformers (#3107) @vmoens
- LLM Transforms:
- Content Management:
ContentBasesystem for structured content handling (#2985) @vmoens - History Tracking: New history system for conversation management (#2965) @vmoens
New Algorithms and Training
- Async SAC: Asynchronous implementation of Soft Actor-Critic (#2946) @vmoens
- Discrete Offline CQL: SOTA implementation for discrete action spaces (#3098) @Ibinarriaga
- Multi-node Ray Support: Enhanced distributed training for GRPO (#3040) @albertbou92
Environment Support
- NPU Support: Added NPU device support for SyncDataCollector (#3155) @lowdy1
- IsaacLab Wrapper: Integration with IsaacLab simulation framework (#2937) @vmoens
- Complete PettingZoo State Support: Enhanced multi-agent environment support (#2953) @JGuzzi
- Minari Integration: Support for loading datasets from local Minari cache (#3068) @Ibinarriaga
Storage and Replay Buffers
- Compressed Storage GPU: GPU acceleration for compressed replay buffers (#3062) @aorenstein68
- Packing: New data packing functionality for efficient storage (#3060) @vmoens
- Ray Replay Buffer: Enhanced distributed replay buffer support (#2949) @vmoens
🔧 Improvements and Enhancements
Performance Optimizations
- Bounded Specs Memory: Single copy optimization for bounded specifications (#2977) @vmoens
- Log-prob Computation: Avoid unnecessary log-prob calculations when retrieving distributions (#3081) @vmoens
- LLM Wrapper Queuing: Performance fixes in LLM wrapper queuing (#3125) @vmoens
- vmap Deactivation: Selective vmap deactivation in objectives for better performance (#2957) @vmoens
API Improvements
- Public SAC Methods: Exposed public methods for SAC algorithm (#3085) @vmoens
- Composite Entropy: Fixed entropy computation for nested keys (#3101) @juandelos
- Multi-head Entropy: Per-head entropy coefficients for PPO (#2972) @Felixs
- ClippedPPOLoss: Support for composite value networks (#3031) @louisfaury
- LineariseRewards: Support for negative weights (#3064) @YoannPoupart
- GAE Typing: Improved typing with optional value networks (#3029) @louisfaury
- Explained Variance: Optional explained variance logging (#3010) @OswaldZink
- Frame Control: Worker-level control over frames_per_batch (#3020) @alexghh
Developer Experience
- Colored Logger: Enhanced logging with colored output (#2967) @vmoens
- Better Error Handling: Improved error catching in env.rollout and rb.add (#3102) @vmoens
- Warning Management: Better warning control for various components (#3099, #3115) @vmoens
- Faster Tests: Optimized test suite performance (#3162) @vmoens
Bug Fixes
Core Functionality
- PRB Serialization: Fixed Prioritized Replay Buffer serialization and loading (#3151, #2963) @vmoens
- Binary Operations: Fixed Binary tensor reshaping and clone operations (#3084, #3077) @LucaCarminati @vmoens
- Categorical Spec: Fixed dtype sampling and masking issues (#2980, #2981) @louisfaury
- ActionMask: Compatibility with composite action specifications (#3022) @louisfaury
- GAE with LSTM: Fixed shifted value computation with LSTM networks (#2941) @vmoens
- Cross-entropy: Fixed log-prob computation for batched input (#3080) @vmoens
Environment and Wrapper Fixes
- TransformedEnv: Fixed in-place modification of specs (#3076) @vmoens
- Parallel Environments: Fixed partial and nested done states (#2959) @vmoens
- Gym Actions: Fixed single action passing when action key is not "action" (#2942) @vmoens
- Brax Memory: Fixed memory leak in Brax environments (#3052) @vmoens
- Atari Patching: Fixed patching for NonTensorData observations (#3091) @marcosGR
Collector and Replay Buffer Fixes
- LLMCollector: Fixed trajectory collection when multiple trajectories complete (#3018) @albertbou92
- Postprocessing: Consistent postprocessing when using replay buffers in collectors (#3144) @vmoens
- Weight Updates: Fixed original weights retrieval in collectors (#2951) @vmoens
- Transform Handling: Fixed transform application and metadata preservation (#3047, #3050) @vmoens
Compatibility and Infrastructure
- PyTorch 2.1.1: Fixed compatibility issues (#3157) @vmoens
- NPU Attribute: Fixed missing NPU attribute (#3159) @vmoens
- CUDA Graph: Fixed update_policy_weights_ with CUDA graphs (#3003) @vmoens
- Stream Capturing: Robust CUDA stream capturing calls (#2950) @vmoens
Documentation and Tutorials
- DQN with RNN Tutorial: Upgraded tutorial with latest best practices (#3152) @vmoens
- LLM API Documentation: Comprehensive documentation for LLM environments and transforms (#2991) @vmoens
- Multi-head Entropy: Better documentation for multi-head entropy usage (#3109) @vmoens
- LSTM Module: Fixed import examples in documentation (#3138) @arvindcr4
- A2C Documentation: Updated AcceptedKeys documentation (#2987) @simeet-n
- History API: Added missing docstrings for History functionality (#3083) @vmoens
- Multi-agent PPO: Fixed tutorial issues (#2940) @matteobettini
- WeightUpdater: Updated documentation after renaming (#3007) @albertbou92
Infrastructure and CI
- Pre-commit Updates: Updated formatting and linting tools (#3108) @vmoens
- Benchmark CI: Fixed benchmark runs and added missing dependencies (#3092, #3163) @vmoens
- Windows CI: Fixed Windows continuous integration (#3028) @vmoens
- Old Dependencies: Fixed CI for older dependency versions (#3165) @vmoens
- C++ Linting: Fixed C++ code linting issues (#3129) @vmoens
- Build System: Improved pyproject.toml usage and versioning (#3089, #3166) @vmoens
🏆 Contributors
Special thanks to all contributors who made this release possible:
- @albertbou92 (Albert Bou) - GRPO multi-node support and LLM improvements
- @Ibinarriaga - CQL offline algorithm and Minari integration
- @aorenstein68 (Adrian Orenstein) - Compressed storage GPU support
- @louisfaury (Louis Faury) - Categorical spec and PPO improvements
- @LucaCarminati (Luca Carminati) - Binary tensor fixes
- @JGuzzi (Jérôme Guzzi) - PettingZoo state support
- @lowdy1 - NPU device support
- @Felixs (Felix Sittenauer) - Multi-head entropy coefficients
- @YoannPoupart (Yoann Poupart) - LineariseRewards improvements
- @OswaldZink (Oswald Zink) - Explained variance logging
- @alexghh (Alexandre Ghelfi) - Frame control improvements
- @marcosGR (Marcos Galletero Romero) - Atari patching fixes
- @matteobettini (Matteo Bettini) - Tutorial fixes
- @simeet-n (Simeet Nayan) - Documentation improvements
- @arvindcr4 - Documentation fixes
- @felixy12 (Felix Yu) - State dict reference fixes
- @SendhilPanchadsaram (Sendhil Panchadsaram) - Documentation typo fixes
- @abhishekunique (Abhishek) - WandB logger and value estimation improvements
- @骑马小猫 - DQN module typo fix
- @ZainRizvi (Zain Rizvi) - CI improvements and meta-pytorch migration
- @mikayla-gawarecki (Mikayla Gawarecki) - Usage tracking and ConditionalPolicySwitch
🔗 Compatibility
- PyTorch: Compatible with PyTorch 2.1.1+ -- recommended >=2.8.0,<2.9.0 for full compatibility
- TensorDict: Updated to work with TensorDict 0.10+
- Python: Supports Python 3.9+
📦 Installation
pip install torchrl==0.10.0For the latest features:
pip install git+https://github.com/pytorch/rl.git@release/0.10.0v0.9.2: Bug fixes and perf improvements
TorchRL 0.9.2 Release Notes
This release focuses on bug fixes, performance improvements, and code quality enhancements.
🚀 New Features
- LineariseRewards: Now supports negative weights for more flexible reward shaping (#3064)
🐛 Bug Fixes
- Fixed policy reference handling in state dictionaries (#3043)
- Improved unbatched data handling in LLM wrappers (#3070)
- Fixed cross-entropy log-probability computation for batched inputs (#3080)
- Fixed Binary
clone()operations (#3077) - Fixed in-place spec modifications in
TransformedEnv(#3076)
⚡ Performance Improvements
- Optimized distribution sampling by avoiding unnecessary log-probability computations (#3081)
🔧 Code Quality
- Standardized coefficient naming in A2C and PPO algorithms (#3079)
📦 Installation
pip install torchrl==0.9.2Thanks to all contributors: @felixy12, @Xmaster6y, @louisfaury and @LCarmi
v0.9.1: fix for history-based vLLM and Transformers wrappers
Fixes an critical issue with vLLMWrapper and TransformersWrapper, where a stack of History objects is resent to stack, resulting in a bug.