Skip to content
View sfasfaffa's full-sized avatar
  • Harbin, China

Block or report sfasfaffa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sfasfaffa/README.md

Hi there 👋

I am Dengyun Peng, a first-year Master's student at HIT and a member of the SCIR LA. I am currently under the supervision of Professor Wanxiang Che, Professor Libo Qin and Ph.D. candidate Qiguang Chen. My current research interests focus on RL4LLM, LLM reasoning. I have research experience in Safe RL and Offline RL.

Intern:

  • iFLYTEK (Hefei)

    • Research Intern, September 2025 – Present
  • Du Xiaoman Financial (Beijing)

    • Research Intern, January 2025 – February 2025
  • Westlake University (Hangzhou)

    • Research Intern, December 2023 – September 2024

Publication:

(EMNLP2025 Findings, Co-First author) DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective (https://arxiv.org/abs/2503.13413)

(NIPS2025, Co-First author) Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (https://nips.cc/virtual/2025/poster/115428)

(ICML2024, Second author) Reinformer: Max-Return Sequence Modeling for Offline RL (https://proceedings.mlr.press/v235/zhuang24b.html)

(SCIENCE CHINA Information Sciences, Fourth Author) Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models (https://arxiv.org/abs/2503.09567)

(Preprint, Fourth Author) ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model (https://arxiv.org/abs/2502.03325)

Email:

[email protected]

[email protected]

Google scholar

https://scholar.google.com.hk/citations?user=XtG_SxwAAAAJ&hl=zh-CN

Popular repositories Loading

  1. DLPO DLPO Public

    Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective}

    Python 10

  2. DR_SAF DR_SAF Public

    Official code for {Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models}

    Python 9

  3. SystemAnalysisAndDesign SystemAnalysisAndDesign Public

    Java 1 1

  4. National_Mathematical_Modeling_Competition_2023_fall National_Mathematical_Modeling_Competition_2023_fall Public

    Python 1

  5. sfasfaffa sfasfaffa Public

    1

  6. freshman_project freshman_project Public

    none

    Java