Skip to content
/ ASCD Public

The official implementation of ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

License

Notifications You must be signed in to change notification settings

BroJunn/ASCD

Repository files navigation

🌋 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

arXiv PyTorch Python License


Official implementation for ASCD.

Latest Updates

  • [2025-08-05] Initial release

Abstract

We propose Attention-Steerable Contrastive Decoding (ASCD): ASCD combines (i) positive steering, which amplifies automatically mined text-centric heads—stable within a model and robust across domains—with (ii) negative steering, which dampens on-the-fly identified critical visual tokens. The method incurs negligible runtime/memory overhead and requires no additional training. Across five MLLM backbones and three decoding schemes, ASCD reduces hallucination on POPE, CHAIR, and MMHal-Bench by up to 38.2% while improving accuracy on standard VQA benchmarks, including MMMU, MM-VET, ScienceQA, TextVQA, and GQA. These results position attention steering as a simple, model-agnostic, and principled route to safer, more faithful multimodal generation.

Table of Contents

Env Setup

Two separate environments are required:

  • ascd: for all models except Qwen
  • ascd-qwen: Qwen models only

1) Non-Qwen (ascd)

conda create -n ascd python=3.10 -y && conda activate ascd
pip install -e ".[llava_series]"

2) Qwen (ascd-qwen)

conda create -n ascd-qwen python=3.10 -y && conda activate ascd-qwen
pip install -e ".[qwen]"

Data Preparation

POPE / MM-VET / MMMU / ScienceQA / TextVQA / GQA

Follow this instructions for download:
👉 https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html

CHAIR

Follow the official repository to download and prepare:
👉 https://github.com/Maxlinn/CHAIR-metric-standalone

MMHal-Bench

Download from the official link:
👉 https://huggingface.co/datasets/Shengcao1006/MMHal-Bench/tree/main

Place it under your data root (see data layout below).

Final Directory Tree

The expected minimal layout. You can choose any data root (here we use data/).

data/
├── chair/
│   ├── annotations/
│   │   ├── captions_train2014.json
│   │   ├── instances_train2014.json
│   │   ├── captions_val2014.json
│   │   └── instances_val2014.json
│   ├── answers/
│   └── val2014/
│
├── mmhal-bench/
│   ├── images/
│   ├── answers/
│   ├── response_template.json
│   └── results/
│
├── pope/           # check in the above link
├── mm-vet/         # check in the above link
├── mmmu/           # check in the above link
├── scienceqa/      # check in the above link
├── textvqa/        # check in the above link
└── gqa/            # check in the above link

Get Started

Evaluation scripts

All runnable scripts live under the folders experiments_*/scripts.

  1. Set your data root by modifying EVAL_DIR to point to your dataset root (see the Final Directory Tree).

  2. Run a script, e.g.,

    sh experiments_v3/scripts/pope-llava_series.sh

Text-centric head identification

The list of text-centric head score maps is available here. If you need to determine the text-centric head distribution, adjust paths/arguments as needed, then run in order:

sh experiments_v3/scripts/chair_attn.sh
sh experiments_v3/scripts/analyse_attn.sh

Citation

If you find our code or models useful in your work, please cite our paper:

@misc{wang2025ascdattentionsteerablecontrastivedecoding,
      title={ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM}, 
      author={Yujun Wang and Jinhe Bi and Yunpu Ma and Soeren Pirk},
      year={2025},
      eprint={2506.14766},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.14766}, 
}

About

The official implementation of ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages