Official implementation for ASCD.
- [2025-08-05] Initial release
We propose Attention-Steerable Contrastive Decoding (ASCD): ASCD combines (i) positive steering, which amplifies automatically mined text-centric heads—stable within a model and robust across domains—with (ii) negative steering, which dampens on-the-fly identified critical visual tokens. The method incurs negligible runtime/memory overhead and requires no additional training. Across five MLLM backbones and three decoding schemes, ASCD reduces hallucination on POPE, CHAIR, and MMHal-Bench by up to 38.2% while improving accuracy on standard VQA benchmarks, including MMMU, MM-VET, ScienceQA, TextVQA, and GQA. These results position attention steering as a simple, model-agnostic, and principled route to safer, more faithful multimodal generation.
Two separate environments are required:
- ascd: for all models except Qwen
- ascd-qwen: Qwen models only
conda create -n ascd python=3.10 -y && conda activate ascd
pip install -e ".[llava_series]"conda create -n ascd-qwen python=3.10 -y && conda activate ascd-qwen
pip install -e ".[qwen]"Follow this instructions for download:
👉 https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html
Follow the official repository to download and prepare:
👉 https://github.com/Maxlinn/CHAIR-metric-standalone
Download from the official link:
👉 https://huggingface.co/datasets/Shengcao1006/MMHal-Bench/tree/main
Place it under your data root (see data layout below).
The expected minimal layout. You can choose any data root (here we use
data/).
data/
├── chair/
│ ├── annotations/
│ │ ├── captions_train2014.json
│ │ ├── instances_train2014.json
│ │ ├── captions_val2014.json
│ │ └── instances_val2014.json
│ ├── answers/
│ └── val2014/
│
├── mmhal-bench/
│ ├── images/
│ ├── answers/
│ ├── response_template.json
│ └── results/
│
├── pope/ # check in the above link
├── mm-vet/ # check in the above link
├── mmmu/ # check in the above link
├── scienceqa/ # check in the above link
├── textvqa/ # check in the above link
└── gqa/ # check in the above link
All runnable scripts live under the folders experiments_*/scripts.
-
Set your data root by modifying
EVAL_DIRto point to your dataset root (see the Final Directory Tree). -
Run a script, e.g.,
sh experiments_v3/scripts/pope-llava_series.sh
The list of text-centric head score maps is available here. If you need to determine the text-centric head distribution, adjust paths/arguments as needed, then run in order:
sh experiments_v3/scripts/chair_attn.sh
sh experiments_v3/scripts/analyse_attn.shIf you find our code or models useful in your work, please cite our paper:
@misc{wang2025ascdattentionsteerablecontrastivedecoding,
title={ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM},
author={Yujun Wang and Jinhe Bi and Yunpu Ma and Soeren Pirk},
year={2025},
eprint={2506.14766},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.14766},
}