This repository is based on our paper: Large Language Model-based Human-Agent Collaboration for Complex Task Solving. It contains the human-agent collaboration dataset we generated, as well as demo code for our fine-tuned human-agent collaboration policy model.
- The Code for different datasets is in
hotpotqa/
,strategyqa/
, andintercode/
.- start training by
scripts/run.sh
- local test environment is in
test_data/
- start training by
- Human-Agent Collaboration Dataset in
dataset/
You can use following scripts to install related python package through pip:
git clone https://github.com/XueyangFeng/ReHAC.git
cd ReHAC
pip install -r requirements.txt
Here, we give an example where we set
python data_preprocess.py ./dataset/gpt4/hotpotqa.jsonl 0.08 ./hotpotqa/data/advantage_sample_count_0.08.jsonl
The processed training data is then saved in ./hotpotqa/data/advantage_sample_count_0.08.jsonl
and you should set TRAIN_DATA_DIR
in run.sh to this path.
You can also find the training data we have processed under hotpotqa/data/advantage
and strategyqa/data
and intercode/data/sql
folders.
cd hotpotqa/scripts
sh run.sh
We random sample 100 questions for test for each dataset. The evaluation result of HotpotQA dataset is under the following figure:
(a) Human-agent collaboration evaluation. (b) GPT-4-agent collaboration evaluation. The bars below the 0-axis represent the human intervention cost
We provide original evaluation outputs of ReHAC
under hotpotqa/results
, strategyqa/results
, and intercode/results
.
If you find that ReHAC is helpful for your work, please cite the follow paper.
@article{feng2024large,
title={Large Language Model-based Human-Agent Collaboration for Complex Task Solving},
author={Feng, Xueyang and Chen, Zhi-Yuan and Qin, Yujia and Lin, Yankai and Chen, Xu and Liu, Zhiyuan and Wen, Ji-Rong},
journal={arXiv preprint arXiv:2402.12914},
year={2024}
}