Haiyang Mei
Difei Gao
Xiaopeng Wei
Xin Yang
Mike Zheng Shou✉️
Show Lab, National University of Singapore
We introduce TrustScorer, which evaluates the trustworthiness of GUI agent actions for selective human intervention when action trust score is low, to help mingling human precision with AI efficiency.
TrustScorer takes as input the user query q, subtask description d, action sequence s, and state observation o, and outputs a trustworthiness label l indicating the likelihood that the action sequence can accomplish the specified subtask.
TrustBench includes 106 specific tasks from 9 commonly used applications as well as 718 agent action sequences along with the corresponding ground-truth annotations.
One TrustBench example on PPT:
The annotation pipeline:
Dataset link: [ OneDrive ] [ BaiduDisk ].
conda env create -f score_env.yml
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
cd client/method/trustmodel
sh train.sh
The checkpoints will be saved under the workdir configured by hydra client/method/trustmodel/conf/config.yaml. The language model we use is a encoder-only DeBERTa model that outputs a score for a pair of query (task) and action (python code), and the implementation is based on SentenceTransformer's Cross-Encoders.
Model link: [ OneDrive ] [ BaiduDisk ].
- Prompting Method
cd client/method
python prompting_gpt4.py
python trustmodel/utils/format.py
- Training Method
cd client/method/trustmodel
python infer.py
python utils/format.py
- generate ace_actions.json
cd ../../utils
python 0_sum_ace_actions.py
- generate TF.json and ACC.txt
python 1_cal_score_accuracy.py
- plot histogram.pdf
python 2_plot_histogram.py
- generate {method_name}_actions.json
python 3_refine_actions.py
- show {method_name}_actions.pdf
python 4_show_actions.py
python 4_show_ace_actions.py
- generate curve_cost_sr.pdf
python 5_draw_curve_cost_sr.py
Result link: [ OneDrive ] [ BaiduDisk ].
- on server
jupyter notebook --port 4337
- on local (Windows System)
1.
use MobaXterm to create a tunel from localhost:6005 to 127.0.0.1:4337 on server
use MobaXterm to create a tunel from localhost:6006 to 127.0.0.1:4338 on server
2.
open Browser (Edge or Chrome) and input:
http://localhost:6005/tree?
3.
then run (shift+enter) the server code (http://localhost:6005/notebooks/server/server-trustworthy.ipynb)
4.
In Windows PyCharm, run (shift+enter) the client code (client/client.ipynb)
Our work builds upon AssistGUI.
If you use TrustScorer/TrustBench in your research, please use the following BibTeX entry.
@InProceedings{Mei_2025_MM_TrustScorer,
author = {Mei, Haiyang and Gao, Difei and Wei, Xiaopeng and Yang, Xin and Shou, Mike Zheng},
title = {Can I Trust You? Advancing GUI Task Automation with Action Trust Score},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
year = {2025},
}Please see LICENSE.
E-Mail: Haiyang Mei (haiyang.mei@outlook.com)



