🌋 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

Official implementation for ASCD.

Latest Updates

[2025-08-05] Initial release

Abstract

We propose Attention-Steerable Contrastive Decoding (ASCD): ASCD combines (i) positive steering, which amplifies automatically mined text-centric heads—stable within a model and robust across domains—with (ii) negative steering, which dampens on-the-fly identified critical visual tokens. The method incurs negligible runtime/memory overhead and requires no additional training. Across five MLLM backbones and three decoding schemes, ASCD reduces hallucination on POPE, CHAIR, and MMHal-Bench by up to 38.2% while improving accuracy on standard VQA benchmarks, including MMMU, MM-VET, ScienceQA, TextVQA, and GQA. These results position attention steering as a simple, model-agnostic, and principled route to safer, more faithful multimodal generation.

Env Setup

Two separate environments are required:

ascd: for all models except Qwen
ascd-qwen: Qwen models only

1) Non-Qwen (ascd)

conda create -n ascd python=3.10 -y && conda activate ascd
pip install -e ".[llava_series]"

2) Qwen (ascd-qwen)

conda create -n ascd-qwen python=3.10 -y && conda activate ascd-qwen
pip install -e ".[qwen]"

Data Preparation

POPE / MM-VET / MMMU / ScienceQA / TextVQA / GQA

Follow this instructions for download:
👉 https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html

CHAIR

Follow the official repository to download and prepare:
👉 https://github.com/Maxlinn/CHAIR-metric-standalone

MMHal-Bench

Download from the official link:
👉 https://huggingface.co/datasets/Shengcao1006/MMHal-Bench/tree/main

Place it under your data root (see data layout below).

Final Directory Tree

The expected minimal layout. You can choose any data root (here we use data/).

data/
├── chair/
│   ├── annotations/
│   │   ├── captions_train2014.json
│   │   ├── instances_train2014.json
│   │   ├── captions_val2014.json
│   │   └── instances_val2014.json
│   ├── answers/
│   └── val2014/
│
├── mmhal-bench/
│   ├── images/
│   ├── answers/
│   ├── response_template.json
│   └── results/
│
├── pope/           # check in the above link
├── mm-vet/         # check in the above link
├── mmmu/           # check in the above link
├── scienceqa/      # check in the above link
├── textvqa/        # check in the above link
└── gqa/            # check in the above link

Get Started

Evaluation scripts

All runnable scripts live under the folders experiments_*/scripts.

Set your data root by modifying EVAL_DIR to point to your dataset root (see the Final Directory Tree).

Run a script, e.g.,

sh experiments_v3/scripts/pope-llava_series.sh

Text-centric head identification

The list of text-centric head score maps is available here. If you need to determine the text-centric head distribution, adjust paths/arguments as needed, then run in order:

sh experiments_v3/scripts/chair_attn.sh
sh experiments_v3/scripts/analyse_attn.sh

Citation

If you find our code or models useful in your work, please cite our paper:

@misc{wang2025ascdattentionsteerablecontrastivedecoding,
      title={ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM}, 
      author={Yujun Wang and Jinhe Bi and Yunpu Ma and Soeren Pirk},
      year={2025},
      eprint={2506.14766},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.14766}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ascd_utils_v3		ascd_utils_v3
ascd_utils_v3_qwen		ascd_utils_v3_qwen
experiments_icd		experiments_icd
experiments_v3		experiments_v3
experiments_vcd		experiments_vcd
lavis		lavis
llava		llava
readme_assets		readme_assets
text_centric_heads		text_centric_heads
tinyllava		tinyllava
tinyllava_phi2		tinyllava_phi2
vti_utils		vti_utils
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chair.pkl		chair.pkl
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌋 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

Latest Updates

Abstract

Table of Contents

Env Setup

1) Non-Qwen (ascd)

2) Qwen (ascd-qwen)

Data Preparation

POPE / MM-VET / MMMU / ScienceQA / TextVQA / GQA

CHAIR

MMHal-Bench

Final Directory Tree

Get Started

Evaluation scripts

Text-centric head identification

Citation

About

Uh oh!

Releases

Packages

Languages

License

BroJunn/ASCD

Folders and files

Latest commit

History

Repository files navigation

🌋 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

Latest Updates

Abstract

Table of Contents

Env Setup

1) Non-Qwen (ascd)

2) Qwen (ascd-qwen)

Data Preparation

POPE / MM-VET / MMMU / ScienceQA / TextVQA / GQA

CHAIR

MMHal-Bench

Final Directory Tree

Get Started

Evaluation scripts

Text-centric head identification

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages