[AAAI26] DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors

Official Paper (AAAI26 Proceedings) | Extended Paper (Appendix)

NOTE: The extended paper is a self-hosted extended version, including appendices and additional experiments. The official AAAI paper should be cited for publication purposes.

Introduction

Proactive dialogue systems require efficient action selection under large action spaces and evolving conversational dynamics. Hence, this repository implements a proactive dialogue planning framework that integrates:

LLM Priors: to generate a rational set of action candidates.
Emotion Trajectory Modeling: to track conversational dynamics.
Q-learning: to select the optimal action from the subset of action candidates.

Highlights

Action space reduction: Leverages LLM priors to select top-k candidate actions, reducing the effective action space and improving sample efficiency without requiring fine-tuning.
Emotion-aware planning: Introduces the integration of emotion trajectories into goal-driven dialogue, enabling context-aware and emotionally informed action selection.
Deep Q-learning: Unlike existing state-of-the-art (SOTA) approaches that rely on MCTS or diffusion-based planning, our method uses Deep Q-learning over a reduced action space. This significantly improves efficiency, reducing LLM calls from 30 to 4 while maintaining competitive performance.
Multi-dataset approach: Evaluated across multiple dialogue domains, including emotional support (ESConv, ExTES), negotiation (CB), tutoring (CIMA), and persuasion (P4G). See the dataset section for details.

Architecture Code Mapping

LLM Priors: env.py (Function: get_prior_actions_llm)
Emotion trajectory: env.py (Storage done on Line 429)
Action selection: train_model.py (Lines 180-196)
Q-learning:
- Buffer storage: train_model.py (Done for every conversation turn)
- Training: train_model.py (Done every epoch)

Datasets

The following datasets have been utilized:

ESConv: Emotional support conversation.
CIMA: Tutoring dialogue dataset.
CraigstlistBargain (CB): Negotiation dialogues
ExTES: Emotional support conversation (similar to ESConv).
P4G: Persuasion dialogues.

All datasets (training, validation, and test splits) are included in this repository under the data/ directory.

Implementation

Quick Start

To train the model on a specific dataset, run python train_model.py --data_name <dataset_name>

NOTE: Dataset-specific prompt configuration is required before training (see below).

Downloading the LLM weights

Download the LLM model weights locally:

Set the desired model name in download_llm_weights (Lines 4-5): https://github.com/declare-lab/dialogxpert/blob/master/download_llm_weights.py#L4-5
Run python download_llm_weights.py

NOTES:

Update the repo_id in download_llm_weights.py to select the desired model.
Ensure that you are logged into huggingface with the appropriate access token.

Dataset-Specific Configuration

Due to differences across dialogue domains, prompt and roleplay functions must be configured manually for each dataset.

For a selected dataset <dataset_name>, update the following components:

Policy Prompt: Replace with {dataset_name}_prompt in: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L196
Roleplay Functions: Replace with {dataset_name}_roleplay:
- System roleplay: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L418
- User roleplay: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L435
- Critic: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L326
- Example: For the ExTES dataset, replace all relevant functions with extes_* variants.

All the available prompt functions can be found in: https://github.com/declare-lab/dialogxpert/blob/master/qwen_prompts.py

Training

After completing the configuration, run python train_model.py --data_name <dataset_name>

Example: python train_model.py --data_name ExTES

Evaluation is performed automatically during training itself:

Per-epoch metrics: Average turns, success rate (https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L24)
Self-play evaluation (turn-level): https://github.com/declare-lab/dialogxpert/blob/master/env.py#L443

Extra information

Results may vary slightly due to stochastic training, Q-learning, LLM sampling, and hardware differences. We recommend fixing random seeds and using consistent environments for reproducibility.

The current implementation uses manual prompt configuration to support flexibility across diverse dialogue domains. While this requires minor manual setup, it enables flexible adaptation across multiple dialogue domains. Future updates will include automated dataset-specific configuration.

Citation

If you find our work useful, please cite the official AAAI paper:

@inproceedings{rakib2026dialogxpert,
  title={DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors},
  author={Abdur Rakib, Tazeek Bin and Mehrish, Ambuj and Soon, Lay-Ki and Lim, Wern Han and Poria, Soujanya},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026},
  doi={10.1609/aaai.v40i36.40244},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/40244}
}

Repository Credits

The following repositories are given credit for their open-source code utilization

- PPDPP: https://github.com/dengyang17/PPDPP/tree/main
- DPDP: https://github.com/cs-holder/DPDP/tree/main
- RL-LLM: https://github.com/yanxue7/RL-LLM-Prior/tree/main

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
images		images
README.md		README.md
download_llm_weights.py		download_llm_weights.py
env.py		env.py
llm_priors.py		llm_priors.py
misc.py		misc.py
prompt.py		prompt.py
q_adapter.py		q_adapter.py
qwen_prompts.py		qwen_prompts.py
train_model.py		train_model.py
utils_data.py		utils_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[AAAI26] DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors

Introduction

Highlights

Architecture Code Mapping

Datasets

Implementation

Quick Start

Downloading the LLM weights

Dataset-Specific Configuration

Training

Extra information

Citation

Repository Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[AAAI26] DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors

Introduction

Highlights

Architecture Code Mapping

Datasets

Implementation

Quick Start

Downloading the LLM weights

Dataset-Specific Configuration

Training

Extra information

Citation

Repository Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages