I am Dengyun Peng, a first-year Master's student at HIT and a member of the SCIR LA. I am currently under the supervision of Professor Wanxiang Che, Professor Libo Qin and Ph.D. candidate Qiguang Chen. My current research interests focus on RL4LLM, LLM reasoning. I have research experience in Safe RL and Offline RL.
-
iFLYTEK (Hefei)
- Research Intern, September 2025 – Present
-
Du Xiaoman Financial (Beijing)
- Research Intern, January 2025 – February 2025
-
Westlake University (Hangzhou)
- Research Intern, December 2023 – September 2024
(EMNLP2025 Findings, Co-First author) DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective (https://arxiv.org/abs/2503.13413)
(NIPS2025, Co-First author) Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (https://nips.cc/virtual/2025/poster/115428)
(ICML2024, Second author) Reinformer: Max-Return Sequence Modeling for Offline RL (https://proceedings.mlr.press/v235/zhuang24b.html)
(SCIENCE CHINA Information Sciences, Fourth Author) Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models (https://arxiv.org/abs/2503.09567)
(Preprint, Fourth Author) ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model (https://arxiv.org/abs/2502.03325)
https://scholar.google.com.hk/citations?user=XtG_SxwAAAAJ&hl=zh-CN

