Update README with new research papers

tAnGjIa520 · web-flow · commit ff55dc8a578a · 2025-11-19T16:56:31.000+08:00
Added new research papers and their details related to Monte Carlo Tree Search and Reinforcement Learning.
diff --git a/README.md b/README.md
@@ -495,6 +495,32 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
   - Key: chemical retrosynthetic planning, neural-based A*-like algorithm, ANDOR tree
   - ExpEnv: USPTO datasets
   - [Code](https://github.com/binghong-ml/retro_star)
+
+
+
+- [Monte Carlo Tree Diffusion for System 2 Planning](https://proceedings.mlr.press/v267/yoon25a.html) 2025
+  - Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn
+  - Key: Diffusion Models, MCTS, System 2 Planning, Trajectory Optimization
+  - ExpEnv: Maze2D, Kitchen, Block stacking
+  - [Code](https://github.com/ahn-ml/mctd)
+- [Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport](https://openreview.net/forum?id=DUGFTH9W8B) 2025
+  - Tuan Quang Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D’Eramo, Odalric-Ambrym Maillard
+  - Key: Optimal Transport, Wasserstein Distance, Uncertainty Propagation, MCTS
+  - ExpEnv: FrozenLake, NChain, RiverSwim, SixArms, Taxi, Rocksample
+- [Online Robust Reinforcement Learning Through Monte-Carlo Planning](https://openreview.net/forum?id=m25ma7O7Ec) 2025
+  - Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman
+  - Key: Robust RL, MCTS, Distributionally Robust Optimization, Sim-to-Real
+  - ExpEnv: Gambler’s Problem, Frozen Lake, American Option Pricing
+  - [Code](https://github.com/brahimdriss/RobustMCTS)
+- [Power Mean Estimation in Stochastic Continuous Monte-Carlo Tree Search](https://icml.cc/virtual/2025/poster/45596) 2025
+  - Tuan Quang Dam
+  - Key: Continuous MCTS, Polynomial Exploration, Stochastic Environments, Power Mean
+  - ExpEnv: Continuous Cartpole, Inverted Pendulum
+- [Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization](https://arxiv.org/abs/2407.05511) 2024
+  - Liam Schramm, Abdeslam Boularias
+  - Key: Exploration, State Occupancy, Long-horizon planning, Volume-MCTS
+  - ExpEnv: Robot Navigation, 2D Maze
+  - [Code](https://github.com/schrammlb2/Volume-MCTS-ICML)
 #### ICLR
 - [OptionZero: Planning with Learned Options](https://openreview.net/forum?id=3IFRygQKGL) 2025  
   - Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu  
@@ -567,6 +593,24 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
   - Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song
   - Key: meta path planning algorithm, exploits a novel neural architecture which can learn promising search directions from problem structures.
   - ExpEnv: a 2d workspace with a 2 DoF (degrees of freedom) point robot, a 3 DoF stick robot and a 5 DoF snake robot
+- [Epistemic Monte Carlo Tree Search](https://openreview.net/forum?id=Tb8RiXOc3N) 2025
+  - Wendelin Boehmer, Zheng Shen, Haoran Duan, Chengzhi Mao, Rosario Scalise
+  - Key: MCTS, Epistemic Uncertainty, Exploration, Sparse Reward, Model-based RL
+  - ExpEnv: Deep Sea, SUBLEQ (Assembly language)
+- [DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search](https://openreview.net/forum?id=I4YAIwrsXa) 2025
+  - DeepSeek Prover Team
+  - Key: Automated Theorem Proving, LLM, MCTS, RL from Proof Assistant Feedback (RLPAF), RMaxTS
+  - ExpEnv: Lean 4, miniF2F, ProofNet
+  - [Code](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5)
+- [Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning](https://openreview.net/forum?id=RGjqr1jBJy) 2025
+  - Lucas Niu Janson, et al.
+  - Key: Offline RL, Model-based RL, Bayes-Adaptive MDP, Uncertainty Propagation
+  - ExpEnv: D4RL
+- [PromptAgent: Strategic Planning with Large Language Models Enables Expert-Level Prompt Optimization](https://openreview.net/forum?id=22pyNMuIoa) 2024
+  - Zhutian Yang, et al.
+  - Key: Prompt Optimization, Strategic Planning, MCTS, LLM Agent
+  - ExpEnv: BIG-Bench Hard (BBH), MMLU, HellaSwag
+  - [Code](https://github.com/zhutianyang/PromptAgent)
 #### NeurIPS
 - [LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios](https://openreview.net/pdf?id=oIUXpBnyjv) 2023
   - Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
@@ -633,7 +677,28 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
   - Matthew Faw, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, Sanjay Shakkottai
   - Key: covariate shift problem, Mix&Match combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions)
   - [Code](https://github.com/matthewfaw/mixnmatch)
-
+- [Feedback-Aware MCTS for Goal-Oriented Information Seeking](https://openreview.net/pdf?id=ustF8MMZDJ) 2025
+  - Harmanpreet Chopra, Chirag Shah
+  - Key: Conversational AI, Goal-Oriented Information Seeking, MCTS, LLM
+  - ExpEnv: 20 Questions, GuessWhat?, MutualFriends
+- [MCTS-Transfer: Monte Carlo Tree Search based Space Transfer for Black-box Optimization](https://openreview.net/forum?id=T5UfIfmDbq) 2024
+  - Shukuan Wang, Ke Xue, Lei Song, Xiaobin Huang, Chao Qian
+  - Key: Black-box Optimization, Transfer Learning, MCTS, Search Space Transfer
+  - ExpEnv: Synthetic functions (Ackley, etc.), Design-Bench, Hyper-parameter optimization
+  - [Code](https://github.com/lamda-bbo/mcts-transfer)
+- [Speculative Monte-Carlo Tree Search](https://proceedings.neurips.cc/paper_files/paper/2024/file/a19940b01b77b6acd41ff8b32b334e7c-Paper-Conference.pdf) 2024
+  - Jungwoo Park, David Wu, Kellin Pelrine, Jimmy Wei, Thomas Anthony, Julian Schrittwieser, Junwhan Ahn
+  - Key: Efficiency, Speculative Execution, Parallelism, AlphaZero
+  - ExpEnv: Go (9x9, 19x19)
+- [Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search](https://proceedings.neurips.cc/paper_files/paper/2024/file/6f479ea488e0908ac8b1b37b27fd134c-Paper-Conference.pdf) 2024
+  - Nicola Dainese, Matteo Merler, Minttu Alakuijala, Pekka Marttinen
+  - Key: Code Generation, World Models, MCTS, Model-based Planning
+  - ExpEnv: CWMB (Code World Models Benchmark), Crafter
+- [ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search](https://openreview.net/forum?id=8rcFOqEud5) 2024
+  - Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang
+  - Key: LLM Self-training, Process Reward, Reasoning, CoT
+  - ExpEnv: GSM8K, MATH
+  - [Code](https://github.com/THUDM/ReST-MCTS)
 #### Other Conference or Journal
 - [Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search](https://arxiv.org/pdf/2012.07910.pdf) AAAI 2021.
 - [On Monte Carlo Tree Search and Reinforcement Learning](https://www.jair.org/index.php/jair/article/download/11099/26289/20632) Journal of Artificial Intelligence Research 2017.