A curated list of 🔥 Distributed Reinforcement Learning papers and systems. Covering scalable algorithms, system frameworks, multi-agent setups, large models, communication strategies, and RLHF — 140+ papers, repos and websites, maintained by Dong Liu and Yanxuan Yu ([email protected]).
🔥🔥🔥 07/2025 MTBench: Benchmarking Massively Parallelized Multi-Task RL for Robotics Tasks
🔥🔥🔥 05/2025 LLM-Explorer: RL Policy Exploration Enhancement Driven by Large Language Models
🔥🔥🔥 05/2025 Place Cells as Position Embeddings for Multi-Time Random Walk Path Planning
🔥🔥🔥 05/2025 Latent Adaptive Planner for Dynamic Manipulation
🔥🔥🔥 05/2025 Seek in the Dark: Test-Time Instance-Level Policy Gradient in Latent Space
🔥🔥🔥 02/2025 Scalable Language Models with Posterior Inference of Latent Thought Vectors
🔥🔥🔥 02/2025 FedHPD: Heterogeneous Federated RL via Policy Distillation
🔥🔥🔥 02/2025 MuJoCo Playground: Rapid Sim-to-Real Robot Learning on GPU (MJX)
🔥🔥🔥 02/2025 Large Language Models for Multi-Robot Systems: A Survey
🔥🔥🔥 12/2024 FedRLHF: Convergence-Guaranteed Federated Framework for Privacy-Preserving RLHF
🔥🔥🔥 07/2024 Privileged Reinforcement Learning for Multi-Robot Exploration
🔥🔥🔥 05/2024 Massively Parallelizing Episodic RL (Orbit/Isaac stack)
🔥🔥🔥 04/2024 ORBIT-Surgical: Open-Simulation Framework for Robot-Assisted Surgery
🔥🔥🔥 01/2024 SRL: Scaling Distributed RL to 10,000+ Cores
🔥🔥🔥 01/2024 DistRL: Async Distributed RL Framework for On-Device Control Agents
- 📌 Related Surveys and Overviews
- 🚀 Algorithms & Theoretical Advances
- 📈 Policy Gradient Methods
- 🎯 Value-Based Methods
- 🎭 Actor-Critic Methods
- ⚡ Asynchronous & Parallel Methods
- 🔄 Gradient Aggregation & Optimization
- 🧮 Theoretical Foundations
- 🎲 Exploration & Sampling
- 🧠 Latent Thought Language Models (LTM) for RL
- 🛰️ Federated Reinforcement Learning (FedRL)
- 🤖 Robotics and Control
- 🧱 System Frameworks
- 📡 Communication Efficiency
- 👥 Multi-Agent Distributed RL
- 🦾 RLHF & Distributed Human Feedback
- 🧠 Large-Scale Models in RL
- Motion Planning in 3D
- 💻 Codebases & Benchmarks
- 📚 Resources & Tutorials
- 🏁 Contributing
- 📜 License
- 📎 Citation
- Survey on Distributed RL
- Scaling RL
- A Survey on Federated RL
- RLHF: Challenges & Opportunities
- Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox
- Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
- Distributed Training for Reinforcement Learning
- A Survey on Multi-Agent Reinforcement Learning and Its Application
| Title | Year | Link |
|---|---|---|
| IMPALA: Scalable Distributed Deep-RL | 2018 | https://arxiv.org/abs/1802.01561 |
| Distributed Proximal Policy Optimization | 2017 | https://arxiv.org/abs/1707.02286 |
| Asynchronous Methods for Deep Reinforcement Learning (A3C) | 2016 | https://arxiv.org/abs/1602.01783 |
| Asynchronous Federated RL with PG Updates | 2024 | https://arxiv.org/abs/2402.08372 |
| Distributed Policy Gradient with Variance Reduction | 2023 | https://arxiv.org/abs/2305.07180 |
| Federated PG for Multi-Agent RL | 2023 | https://arxiv.org/abs/2304.12151 |
| Scalable Distributed Policy Optimization via Consensus | 2024 | https://arxiv.org/abs/2401.09876 |
| Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach | 2019 | https://arxiv.org/abs/1912.09135 |
| Title | Year | Link |
|---|---|---|
| Ape-X: Distributed Prioritized Replay | 2018 | https://arxiv.org/abs/1803.00933 |
| Distributed Deep Q-Learning | 2015 | https://arxiv.org/abs/1508.04186 |
| Rainbow: Combining Improvements in Deep Reinforcement Learning | 2023 | https://ojs.aaai.org/index.php/AAAI/article/view/11796 |
| Federated Q-Learning: Linear Regret Speedup with Low Communication Cost | 2023 | https://arxiv.org/abs/2312.15023 |
| Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning | 2023 | https://arxiv.org/abs/2006.10443 |
| Title | Year | Link |
|---|---|---|
| SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference | 2019 | https://arxiv.org/abs/1910.06591 |
| Muesli | 2021 | https://arxiv.org/abs/2104.06159 |
| Distributed Soft Actor-Critic with Experience Replay | 2023 | https://arxiv.org/abs/2308.14567 |
| Federated Actor-Critic for Continuous Control | 2023 | https://arxiv.org/abs/2311.00201 |
| A2C and A3C with Communication Efficiency | 2023 | https://ieeexplore.ieee.org/abstract/document/9289269 |
| Title | Year | Link |
|---|---|---|
| SRL: Scaling Distributed RL to 10,000+ Cores | 2024 | https://arxiv.org/abs/2306.15108 |
| DistRL: An Async Distributed RL Framework | 2024 | https://arxiv.org/abs/2401.12453 |
| Loss- and Reward-Weighting for Efficient DRL | 2024 | https://arxiv.org/abs/2311.01354 |
| Accelerated Methods for Distributed RL | 2022 | https://arxiv.org/abs/2203.09511 |
| Parallel A2C with Shared Experience | 2023 | https://arxiv.org/abs/2307.19876 |
| Distributed Async Temporal Difference Learning | 2024 | https://arxiv.org/abs/2402.17890 |
| DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents | 2024 | https://arxiv.org/abs/2410.14803 |
| SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores | 2023 | https://arxiv.org/abs/2306.16688 |
| MSRL: Distributed Reinforcement Learning with Dataflow Fragments | 2022 | https://arxiv.org/abs/2210.00882 |
| Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform | 2023 | https://arxiv.org/abs/2310.00036 |
| The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems | 2020 | https://arxiv.org/abs/2012.04210 |
| Title | Year | Link |
|---|---|---|
| Gradient Compression for Distributed DRL | 2023 | https://arxiv.org/abs/2305.14321 |
| Byzantine-Robust Distributed RL | 2024 | https://arxiv.org/abs/2403.08765 |
| Variance-Reduced Distributed PG | 2023 | https://arxiv.org/abs/2309.18765 |
| Momentum-Based Distributed RL | 2024 | https://arxiv.org/abs/2401.23456 |
| Title | Year | Link |
|---|---|---|
| Convergence Analysis of Distributed RL | 2023 | https://arxiv.org/abs/2304.09876 |
| Sample Complexity of Federated RL | 2024 | https://arxiv.org/abs/2402.12345 |
| Communication Complexity in Distributed RL | 2023 | https://arxiv.org/abs/2308.56789 |
| Regret Bounds for Distributed Bandits | 2024 | https://arxiv.org/abs/2405.11111 |
| Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation | 2025 | https://ojs.aaai.org/index.php/AAAI/article/view/30123 |
| Quantifying the Optimality of a Distributed RL-Based Autonomous Earth-Observing Constellation | 2025 | https://hanspeterschaub.info/Papers/Stephenson2025.pdf |
| Title | Year | Link |
|---|---|---|
| Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning | 2022 | https://arxiv.org/abs/2202.02693 |
| Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo | 2023 | https://arxiv.org/abs/2305.18246 |
| More Efficient Randomized Exploration for Reinforcement Learning | 2024 | https://arxiv.org/abs/2406.12241 |
| LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models | 2025 | https://arxiv.org/abs/2505.15293 |
| Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning | 2024 | https://arxiv.org/abs/2406.08069 |
| Title | Year | Link |
|---|---|---|
| Place Cells as Position Embeddings of Multi-Time Random Walk Transition Kernels for Path Planning | 2025 | https://arxiv.org/abs/2505.14806 |
| Latent Adaptive Planner for Dynamic Manipulation | 2025 | https://arxiv.org/abs/2505.03077 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | 2025 | https://arxiv.org/abs/2505.13308 |
| Scalable Language Models with Posterior Inference of Latent Thought Vectors | 2025 | https://arxiv.org/abs/2502.01567 |
| Training Chain-of-Thought via Latent-Variable Inference | 2024 | https://openreview.net/forum?id=a147pIS2Co |
| Title | Year | Link |
|---|---|---|
| Federated Deep Reinforcement Learning | 2019 | https://arxiv.org/abs/1901.08277 |
| Federated Reinforcement Learning: Techniques, Applications, and Open Challenges | 2021 | https://arxiv.org/abs/2108.11887 |
| Federated Reinforcement Learning with Constraint Heterogeneity | 2024 | https://arxiv.org/abs/2405.03236 |
| Federated reinforcement learning for robot motion planning with zero-shot generalization | 2024 | https://arxiv.org/abs/2403.13245 |
| Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments | 2024 | https://arxiv.org/abs/2405.19499 |
| Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis | 2024 | https://arxiv.org/abs/2404.08003 |
| FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation | 2025 | https://arxiv.org/abs/2502.00870 |
| Federated Reinforcement Learning with Environment Heterogeneity | 2022 | https://arxiv.org/abs/2204.02634 |
| Federated Ensemble-Directed Offline Reinforcement Learning | 2023 | https://arxiv.org/abs/2305.03097 |
| FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF | 2024 | https://arxiv.org/abs/2412.15538 |
| Title | Year | Link |
|---|---|---|
| QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | 2018 | https://arxiv.org/abs/1806.10293 |
| Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach | 2019 | https://arxiv.org/abs/1912.09135 |
| Distributed Reinforcement Learning for Robot Teams: A Review | 2022 | https://arxiv.org/abs/2204.03516 |
| Open X-Embodiment: Robotic Learning Datasets and RT-X Models | 2023 | https://arxiv.org/abs/2310.08864 |
| Federated Reinforcement Learning for Robot Motion Planning with Zero-Shot Generalization | 2024 | https://dl.acm.org/doi/10.1016/j.automatica.2024.111709 |
| Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-Limited Multi-Robot Exploration | 2024 | https://arxiv.org/abs/2407.20203 |
| Reinforcement Learning of Adaptive Multi-Robot Cooperative Transport (TIHDP) | 2024 | https://arxiv.org/abs/2404.02362 |
| Massively Parallelizing Episodic Reinforcement Learning (Orbit/Isaac stack) | 2024 | https://arxiv.org/abs/2405.11512 |
| ORBIT-Surgical: An Open-Simulation Framework for Robot-Assisted Surgery | 2024 | https://autolab.berkeley.edu/assets/publications/media/2024-ICRA-ORBIT-Surgical.pdf |
| MuJoCo Playground: Rapid Sim-to-Real Robot Learning on GPU (MJX) | 2025 | https://arxiv.org/abs/2502.08844 |
| Benchmarking Massively Parallelized Multi-Task RL for Robotics Tasks (MTBench) | 2025 | https://arxiv.org/abs/2507.23172 |
| Large Language Models for Multi-Robot Systems: A Survey | 2025 | https://arxiv.org/abs/2502.03814 |
This section summarizes foundational algorithms and widely used libraries for motion planning in three-dimensional spaces. Links are to stable DOIs, archival PDFs, or official project pages.
| Title | Year | Link |
|---|---|---|
| Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (PRM) | 1996 | https://doi.org/10.1109/70.508439 |
| Rapidly-Exploring Random Trees (RRT) | 1998 | http://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf |
| Sampling-based Algorithms for Optimal Motion Planning (RRT*) | 2011 | https://arxiv.org/abs/1105.1186 |
| Fast Marching Tree (FMT*) | 2013 | https://arxiv.org/abs/1306.3532 |
| CHOMP: Gradient-Based Trajectory Optimization for Motion Planning | 2009 | https://personalrobotics.cs.washington.edu/publications/ratliff2009chomp.pdf |
| Project | Link |
|---|---|
| OMPL (Open Motion Planning Library) | https://ompl.kavrakilab.org/ |
| MoveIt | https://moveit.picknik.ai/ |
| OctoMap | https://octomap.github.io/ |
| Title | Year | Link |
|---|---|---|
| Design and Evaluation of Motion Planners for Quadrotors in Environments with Varying Complexities | 2024 | https://arxiv.org/abs/2309.13720 |
| Diffusion-based Generation, Optimization, and Planning in 3D Scenes | 2023 | https://arxiv.org/abs/2301.06015 |
| A Novel Vector-Field-Based Motion Planning Algorithm for 3D Nonholonomic Robots | 2023 | https://arxiv.org/abs/2302.11110 |
| Motion Planning for Robotics: A Review for Sampling-based Planners | 2024 | https://arxiv.org/abs/2410.19414 |
| Real-time Motion Planning for Autonomous Vehicles in Dynamic Environments | 2024 | https://arxiv.org/abs/2406.02916 |
| Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models | 2025 | https://arxiv.org/abs/2502.03607 |
| Multi-layer Motion Planning with Kinodynamic and Spatio-Temporal Constraints | 2025 | https://arxiv.org/abs/2503.07762 |
| Safe Flow Matching: Robot Motion Planning with Control Barrier Functions | 2025 | https://arxiv.org/abs/2504.08661 |
| SPADE: Scalable Path Planning Architecture on Multi-Domain 3D Scene Graphs | 2025 | https://arxiv.org/abs/2505.19098 |
| Rapid Motion Planning of Manipulator in Three-Dimensional Space Under Multiple Scenes | 2024 | https://doi.org/10.1177/09544062241271744 |
| Task and Motion Planning in Hierarchical 3D Scene Graphs | 2024 | https://arxiv.org/abs/2403.08094 |
| Title | Year | Link |
|---|---|---|
| Ray RLlib | 2021 | https://docs.ray.io/en/latest/rllib/ |
| RLlib: Abstractions for Distributed Reinforcement Learning | 2018 | https://arxiv.org/abs/1712.09381 |
| Acme | 2020 | https://github.com/deepmind/acme |
| TorchBeast: A PyTorch Platform for Distributed RL | 2019 | https://arxiv.org/abs/1910.03552 |
| TorchRL | 2023 | https://pytorch.org/rl/ |
| CleanRL + SLURM | 2022 | https://github.com/vwxyzjn/cleanrl |
| Cleanba | 2023 | https://github.com/vwxyzjn/cleanba |
| FedHQL | 2023 | https://arxiv.org/abs/2301.11135 |
| DistRL Framework | 2024 | https://arxiv.org/abs/2401.15803 |
| AReaL | 2025 | https://github.com/inclusionAI/AReaL |
| SandGraphX | 2025 | https://github.com/NoakLiu/SandGraphX |
| Title | Year | Link |
|---|---|---|
| Deep Gradient Compression | 2017 | https://arxiv.org/abs/1712.01887 |
| Gradient Surgery | 2020 | https://arxiv.org/abs/2001.06782 |
| DD-PPO | 2020 | https://arxiv.org/abs/2007.04938 |
| Gradient Dropping | 2018 | https://arxiv.org/abs/1806.08768 |
| Bandwidth-Aware RL | 2023 | https://arxiv.org/abs/2303.08127 |
| QDDP: Quantized DRL | 2021 | https://arxiv.org/abs/2102.09352 |
| Communication-Aware Distributed RL | 2024 | https://arxiv.org/abs/2402.17222 |
| Application of Reinforcement Learning to Routing in Distributed Wireless Networks: A Review | 2015 | https://link.springer.com/article/10.1007/s10462-012-9383-6 |
| Title | Year | Link |
|---|---|---|
| MADDPG | 2017 | https://arxiv.org/abs/1706.02275 |
| MAVEN | 2019 | https://arxiv.org/abs/1910.07483 |
| R-MADDPG | 2022 | https://arxiv.org/abs/2202.03428 |
| Tesseract | 2022 | https://arxiv.org/abs/2211.03537 |
| FMRL-LA | 2023 | https://arxiv.org/abs/2310.11572 |
| CAESAR | 2024 | https://arxiv.org/abs/2402.07426 |
| FAgents: Federated MARL | 2023 | https://arxiv.org/abs/2312.22222 |
| Distributed Reinforcement Learning for Robot Teams: A Review | 2022 | https://arxiv.org/abs/2204.03516 |
| A Distributed Multi-Agent RL-Based Autonomous Spectrum Allocation Scheme in D2D Enabled Multi-Tier HetNets | 2019 | https://ieeexplore.ieee.org/document/8598795 |
| Title | Year | Link |
|---|---|---|
| InstructGPT | 2022 | https://arxiv.org/abs/2203.02155 |
| Self-Instruct | 2022 | https://arxiv.org/abs/2212.10560 |
| DPO | 2023 | https://arxiv.org/abs/2305.18290 |
| RAFT | 2024 | https://arxiv.org/abs/2402.03620 |
| Distributed PPO (HF-TRL) | 2023 | https://huggingface.co/docs/trl/main/en/ |
| Decentralized RLHF | 2023 | https://arxiv.org/abs/2310.11883 |
| FedRLHF | 2024 | https://arxiv.org/abs/2412.15538 |
| Title | Year | Link |
|---|---|---|
| Gato | 2022 | https://arxiv.org/abs/2205.06175 |
| PaLM-E | 2023 | https://arxiv.org/abs/2303.03378 |
| Decision Transformer | 2021 | https://arxiv.org/abs/2106.01345 |
| V-D4RL | 2022 | https://arxiv.org/abs/2202.02349 |
| TAMER-GPT | 2023 | https://arxiv.org/abs/2305.11521 |
| Gorilla | 2023 | https://arxiv.org/abs/2305.01569 |
| Open X-Embodiment | 2023 | https://arxiv.org/abs/2306.03367 |
We welcome contributions! If you find a missing paper, tool, or benchmark related to distributed reinforcement learning, please feel free to open a pull request or issue.
MIT License. See LICENSE for details.
@online{liu2025awesome,
author = {Dong Liu, Yanxuan Yu, Ying Nian Wu and contributors},
title = {Awesome Distributed Reinforcement Learning},
year = {2025},
url = {https://github.com/NoakLiu/Awesome-Distributed-RL},
note = {GitHub Repository}
}