AnswerOfTime sfasfaffa

Hi there 👋

I am Dengyun Peng, a first-year Master's student at HIT and a member of the SCIR LA. I am currently under the supervision of Professor Wanxiang Che, Professor Libo Qin and Ph.D. candidate Qiguang Chen. My current research interests focus on RL4LLM, LLM reasoning. I have research experience in Safe RL and Offline RL.

Intern:

iFLYTEK (Hefei)
- Research Intern, September 2025 – Present
Du Xiaoman Financial (Beijing)
- Research Intern, January 2025 – February 2025
Westlake University (Hangzhou)
- Research Intern, December 2023 – September 2024

Publication:

(EMNLP2025 Findings, Co-First author) DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective (https://arxiv.org/abs/2503.13413)

(NIPS2025, Co-First author) Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (https://nips.cc/virtual/2025/poster/115428)

(ICML2024, Second author) Reinformer: Max-Return Sequence Modeling for Offline RL (https://proceedings.mlr.press/v235/zhuang24b.html)

(SCIENCE CHINA Information Sciences, Fourth Author) Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models (https://arxiv.org/abs/2503.09567)

(Preprint, Fourth Author) ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model (https://arxiv.org/abs/2502.03325)

Email:

[email protected]

Google scholar

https://scholar.google.com.hk/citations?user=XtG_SxwAAAAJ&hl=zh-CN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnswerOfTime sfasfaffa

Achievements

Achievements

Block or report sfasfaffa

Hi there 👋

Intern:

Publication:

Email:

Google scholar

Popular repositories Loading

Uh oh!