Skip to content

ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score

License

Notifications You must be signed in to change notification settings

showlab/TrustScorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can I Trust You?
Advancing GUI Task Automation with Action Trust Score

Haiyang Mei    Difei Gao    Xiaopeng Wei    Xin Yang    Mike Zheng Shou✉️
Show Lab, National University of Singapore

Watch the video

1. TrustScorer

We introduce TrustScorer, which evaluates the trustworthiness of GUI agent actions for selective human intervention when action trust score is low, to help mingling human precision with AI efficiency.

TrustScorer takes as input the user query q, subtask description d, action sequence s, and state observation o, and outputs a trustworthiness label l indicating the likelihood that the action sequence can accomplish the specified subtask.

2. TrustBench

TrustBench includes 106 specific tasks from 9 commonly used applications as well as 718 agent action sequences along with the corresponding ground-truth annotations.

One TrustBench example on PPT:

The annotation pipeline:

Dataset link: [ OneDrive ] [ BaiduDisk ].

3. Implementation

3.1 Setup Environment

conda env create -f score_env.yml
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

3.2 Training

cd client/method/trustmodel
sh train.sh

The checkpoints will be saved under the workdir configured by hydra client/method/trustmodel/conf/config.yaml. The language model we use is a encoder-only DeBERTa model that outputs a score for a pair of query (task) and action (python code), and the implementation is based on SentenceTransformer's Cross-Encoders.

Model link: [ OneDrive ] [ BaiduDisk ].

3.3 Inference

  • Prompting Method
cd client/method
python prompting_gpt4.py
python trustmodel/utils/format.py
  • Training Method
cd client/method/trustmodel
python infer.py
python utils/format.py

3.4 Visualization

  1. generate ace_actions.json
cd ../../utils
python 0_sum_ace_actions.py
  1. generate TF.json and ACC.txt
python 1_cal_score_accuracy.py
  1. plot histogram.pdf
python 2_plot_histogram.py
  1. generate {method_name}_actions.json
python 3_refine_actions.py
  1. show {method_name}_actions.pdf
python 4_show_actions.py
python 4_show_ace_actions.py
  1. generate curve_cost_sr.pdf
python 5_draw_curve_cost_sr.py

Result link: [ OneDrive ] [ BaiduDisk ].

3.5 Execution

  • on server
jupyter notebook --port 4337
  • on local (Windows System)
1.
use MobaXterm to create a tunel from localhost:6005 to 127.0.0.1:4337 on server
use MobaXterm to create a tunel from localhost:6006 to 127.0.0.1:4338 on server

2.
open Browser (Edge or Chrome) and input:
http://localhost:6005/tree?

3.
then run (shift+enter) the server code (http://localhost:6005/notebooks/server/server-trustworthy.ipynb)

4.
In Windows PyCharm, run (shift+enter) the client code (client/client.ipynb)

4. Acknowledgements

Our work builds upon AssistGUI.

5. Citation

If you use TrustScorer/TrustBench in your research, please use the following BibTeX entry.

@InProceedings{Mei_2025_MM_TrustScorer,
    author    = {Mei, Haiyang and Gao, Difei and Wei, Xiaopeng and Yang, Xin and Shou, Mike Zheng},
    title     = {Can I Trust You? Advancing GUI Task Automation with Action Trust Score},
    booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
    year      = {2025},
}

6. License

Please see LICENSE.

7. Contact

E-Mail: Haiyang Mei (haiyang.mei@outlook.com)

⬆ back to top

About

ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors