[AAAI26] DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors
Official Paper (AAAI26 Proceedings) | Extended Paper (Appendix)
NOTE: The extended paper is a self-hosted extended version, including appendices and additional experiments. The official AAAI paper should be cited for publication purposes.
Proactive dialogue systems require efficient action selection under large action spaces and evolving conversational dynamics. Hence, this repository implements a proactive dialogue planning framework that integrates:
- LLM Priors: to generate a rational set of action candidates.
- Emotion Trajectory Modeling: to track conversational dynamics.
- Q-learning: to select the optimal action from the subset of action candidates.
-
Action space reduction: Leverages LLM priors to select top-k candidate actions, reducing the effective action space and improving sample efficiency without requiring fine-tuning.
-
Emotion-aware planning: Introduces the integration of emotion trajectories into goal-driven dialogue, enabling context-aware and emotionally informed action selection.
-
Deep Q-learning: Unlike existing state-of-the-art (SOTA) approaches that rely on MCTS or diffusion-based planning, our method uses Deep Q-learning over a reduced action space. This significantly improves efficiency, reducing LLM calls from 30 to 4 while maintaining competitive performance.
-
Multi-dataset approach: Evaluated across multiple dialogue domains, including emotional support (ESConv, ExTES), negotiation (CB), tutoring (CIMA), and persuasion (P4G). See the dataset section for details.
- LLM Priors:
env.py(Function:get_prior_actions_llm) - Emotion trajectory:
env.py(Storage done on Line 429) - Action selection:
train_model.py(Lines 180-196) - Q-learning:
- Buffer storage:
train_model.py(Done for every conversation turn) - Training:
train_model.py(Done every epoch)
- Buffer storage:
The following datasets have been utilized:
- ESConv: Emotional support conversation.
- CIMA: Tutoring dialogue dataset.
- CraigstlistBargain (CB): Negotiation dialogues
- ExTES: Emotional support conversation (similar to ESConv).
- P4G: Persuasion dialogues.
All datasets (training, validation, and test splits) are included in this repository under the data/ directory.
To train the model on a specific dataset, run python train_model.py --data_name <dataset_name>
NOTE: Dataset-specific prompt configuration is required before training (see below).
Download the LLM model weights locally:
-
Set the desired model name in
download_llm_weights(Lines 4-5): https://github.com/declare-lab/dialogxpert/blob/master/download_llm_weights.py#L4-5 -
Run
python download_llm_weights.py
NOTES:
-
Update the
repo_idindownload_llm_weights.pyto select the desired model. -
Ensure that you are logged into huggingface with the appropriate access token.
Due to differences across dialogue domains, prompt and roleplay functions must be configured manually for each dataset.
For a selected dataset <dataset_name>, update the following components:
-
Policy Prompt: Replace with
{dataset_name}_promptin: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L196 -
Roleplay Functions: Replace with {dataset_name}_roleplay:
-
System roleplay: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L418
-
User roleplay: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L435
-
Critic: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L326
-
Example: For the ExTES dataset, replace all relevant functions with
extes_*variants.
-
All the available prompt functions can be found in: https://github.com/declare-lab/dialogxpert/blob/master/qwen_prompts.py
After completing the configuration, run python train_model.py --data_name <dataset_name>
Example:
python train_model.py --data_name ExTES
Evaluation is performed automatically during training itself:
- Per-epoch metrics: Average turns, success rate (https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L24)
- Self-play evaluation (turn-level): https://github.com/declare-lab/dialogxpert/blob/master/env.py#L443
Results may vary slightly due to stochastic training, Q-learning, LLM sampling, and hardware differences. We recommend fixing random seeds and using consistent environments for reproducibility.
The current implementation uses manual prompt configuration to support flexibility across diverse dialogue domains. While this requires minor manual setup, it enables flexible adaptation across multiple dialogue domains. Future updates will include automated dataset-specific configuration.
If you find our work useful, please cite the official AAAI paper:
@inproceedings{rakib2026dialogxpert,
title={DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors},
author={Abdur Rakib, Tazeek Bin and Mehrish, Ambuj and Soon, Lay-Ki and Lim, Wern Han and Poria, Soujanya},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026},
doi={10.1609/aaai.v40i36.40244},
url={https://ojs.aaai.org/index.php/AAAI/article/view/40244}
}
The following repositories are given credit for their open-source code utilization
- PPDPP: https://github.com/dengyang17/PPDPP/tree/main
- DPDP: https://github.com/cs-holder/DPDP/tree/main
- RL-LLM: https://github.com/yanxue7/RL-LLM-Prior/tree/main
