Can I Trust You?
Advancing GUI Task Automation with Action Trust Score

Haiyang Mei Difei Gao Xiaopeng Wei Xin Yang Mike Zheng Shou^✉️
Show Lab, National University of Singapore

Table of Contents

1. TrustScorer

We introduce TrustScorer, which evaluates the trustworthiness of GUI agent actions for selective human intervention when action trust score is low, to help mingling human precision with AI efficiency.

TrustScorer takes as input the user query q, subtask description d, action sequence s, and state observation o, and outputs a trustworthiness label l indicating the likelihood that the action sequence can accomplish the specified subtask.

2. TrustBench

TrustBench includes 106 specific tasks from 9 commonly used applications as well as 718 agent action sequences along with the corresponding ground-truth annotations.

One TrustBench example on PPT:

The annotation pipeline:

Dataset link: [ OneDrive ] [ BaiduDisk ].

3. Implementation

3.1 Setup Environment

conda env create -f score_env.yml
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

3.2 Training

cd client/method/trustmodel
sh train.sh

The checkpoints will be saved under the workdir configured by hydra client/method/trustmodel/conf/config.yaml. The language model we use is a encoder-only DeBERTa model that outputs a score for a pair of query (task) and action (python code), and the implementation is based on SentenceTransformer's Cross-Encoders.

Model link: [ OneDrive ] [ BaiduDisk ].

3.3 Inference

Prompting Method

cd client/method
python prompting_gpt4.py
python trustmodel/utils/format.py

Training Method

cd client/method/trustmodel
python infer.py
python utils/format.py

3.4 Visualization

generate ace_actions.json

cd ../../utils
python 0_sum_ace_actions.py

generate TF.json and ACC.txt

python 1_cal_score_accuracy.py

plot histogram.pdf

python 2_plot_histogram.py

generate {method_name}_actions.json

python 3_refine_actions.py

show {method_name}_actions.pdf

python 4_show_actions.py
python 4_show_ace_actions.py

generate curve_cost_sr.pdf

python 5_draw_curve_cost_sr.py

Result link: [ OneDrive ] [ BaiduDisk ].

3.5 Execution

on server

jupyter notebook --port 4337

on local (Windows System)

1.
use MobaXterm to create a tunel from localhost:6005 to 127.0.0.1:4337 on server
use MobaXterm to create a tunel from localhost:6006 to 127.0.0.1:4338 on server

2.
open Browser (Edge or Chrome) and input:
http://localhost:6005/tree?

3.
then run (shift+enter) the server code (http://localhost:6005/notebooks/server/server-trustworthy.ipynb)

4.
In Windows PyCharm, run (shift+enter) the client code (client/client.ipynb)

4. Acknowledgements

Our work builds upon AssistGUI.

5. Citation

If you use TrustScorer/TrustBench in your research, please use the following BibTeX entry.

@InProceedings{Mei_2025_MM_TrustScorer,
    author    = {Mei, Haiyang and Gao, Difei and Wei, Xiaopeng and Yang, Xin and Shou, Mike Zheng},
    title     = {Can I Trust You? Advancing GUI Task Automation with Action Trust Score},
    booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
    year      = {2025},
}

6. License

Please see LICENSE.

7. Contact

E-Mail: Haiyang Mei (haiyang.mei@outlook.com)

⬆ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
client		client
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
score_env.yml		score_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can I Trust You?
Advancing GUI Task Automation with Action Trust Score

1. TrustScorer

2. TrustBench

3. Implementation

3.1 Setup Environment

3.2 Training

3.3 Inference

3.4 Visualization

3.5 Execution

4. Acknowledgements

5. Citation

6. License

7. Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

showlab/TrustScorer

Folders and files

Latest commit

History

Repository files navigation

Can I Trust You? Advancing GUI Task Automation with Action Trust Score

1. TrustScorer

2. TrustBench

3. Implementation

3.1 Setup Environment

3.2 Training

3.3 Inference

3.4 Visualization

3.5 Execution

4. Acknowledgements

5. Citation

6. License

7. Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Can I Trust You?
Advancing GUI Task Automation with Action Trust Score

Packages