Skip to content

[ICCV 2025] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

License

Notifications You must be signed in to change notification settings

Zhangyr2022/D3QE

Repository files navigation

[ICCV 2025] D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

   
Dataset ARForensics is available at: [🤗 HuggingFace] | [🤖 ModelScope]

Created by Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu

Table of Contents

Introduction

D³QE is a detection method designed to identify images generated by visual autoregressive (AR) models. The core idea is to exploit discrete distribution discrepancies and quantization error patterns produced by tokenized autoregressive generation. Key highlights:

  • Integrates dynamic codebook frequency statistics into a transformer attention module.
  • Fuses semantic image features with latent representations of quantization/quantizer error.
  • Demonstrates strong detection accuracy, cross-model generalization, and robustness to common real-world perturbations.

This repo contains the code, dataset, and scripts used in the paper to facilitate reproducible experiments.

News 🔥

  • 🆕 2025-10-09 — Our code is released.
  • 🆕 2025-10-08 — arXiv preprint released.
  • 🆕 2025-07-23 — Accepted to ICCV 2025!🔥

Quick Start

  1. Clone the repository:
git clone https://github.com/Zhangyr2022/D3QE
cd D3QE
  1. Create the environment and install dependencies:
conda create -n D3QE python=3.11 -y
conda activate D3QE
pip install -r requirements.txt
# If you have GPU(s), ensure CUDA and PyTorch are installed correctly for your environment.
  1. Download the dataset (see Dataset below) and place it under ./data/ARForensics (or a path you prefer). Download the pretrained LlamaGen vqvae model vq_ds16_c2i.pt from LlamaGen and place it under ./pretrained.

  2. Train a model:

bash train.sh
  1. Evaluate:
bash eval.sh

Dataset

We provide the ARForensics benchmark — the first large-scale dataset specifically for visual autoregressive model detection. 7 Autoregressive models included (diverse token/scale architectures): LlamaGen, VAR, Infinity, Janus-Pro, RAR, Switti, and Open-MAGVIT2.

Splits:

  • Training: 100k LlamaGen images + 100k ImageNet images
  • Validation: 10k LlamaGen images + 10k ImageNet images
  • Test: balanced test set with 6k samples per model

Download: The dataset ARForensics is uploaded and available at: 🤗 HuggingFace | 🤖 ModelScope.

Folder structure (expected):

ARForensics/
├─ train/
│  ├─ 0_real/
│  └─ 1_fake/
├─ val/
│  ├─ 0_real/
│  └─ 1_fake/
└─ test/
   ├─ Infinity/
   │  ├─ 0_real/
   │  └─ 1_fake/
   ├─ Janus_Pro/
   │  ├─ ..
   ├─ RAR/
   ├─ Switti/
   ├─ VAR/
   ├─ LlamaGen/
   └─ Open_MAGVIT2/

Training

A provided training script train.sh wraps the typical training pipeline. You can tweak the hyper-parameters directly in the script or by editing the training config file used by the codebase. We train the model on a single GPU by default. (24GB GPU memory recommended)

Example:

bash train.sh
# or run the training entrypoint directly, e.g.
python train.py \
    --name D3QE_rerun \
    --dataroot /path/to/your/dataset \
    --detect_method D3QE \
    --blur_prob 0.1 \
    --blur_sig 0.0,3.0 \
    --jpg_prob 0.1 \
    --jpg_method cv2,pil \
    --jpg_qual 30,100 \

Evaluation

eval.py exposes many options to evaluate detection performance and robustness.

usage: eval.py [-h] [--rz_interp RZ_INTERP] [--batch_size BATCH_SIZE]
               [--loadSize LOADSIZE] [--CropSize CROPSIZE] [--no_crop]
               [--no_resize] [--no_flip] [--robust_all]
               [--detect_method DETECT_METHOD] [--dataroot DATAROOT]
               [--sub_dir SUB_DIR] [--model_path MODEL_PATH]

Key flags:

  • --batch_size (default: 64)
  • --loadSize / --CropSize for image preprocessing (defaults: 256 / 224)
  • --robust_all to evaluate model robustness across different noises/attacks
  • --sub_dir list of subfolders in the test set (defaults to the 7 AR models)
  • --model_path path to your trained model checkpoint

Example (evaluate D³QE):

There's an eval.sh with default settings you can adapt.

bash eval.sh
# or run evaluation directly
python eval.py \
    --model_path /your/model/path \
    --detect_method D3QE  \
    --batch_size 1 \
    --dataroot /path/to/your/testset \
    --sub_dir '["Infinity","Janus_Pro","RAR","Switti","VAR","LlamaGen","Open_MAGVIT2"]'

Pretrained Models

Pretrained model checkpoints are uploaded at: 🤗 Hugging Face

Acknowledgments

This codebase builds on and borrows design patterns from:

Thanks to the authors of those projects for making their code and models available.

Citation

If you use this repository or dataset in your research, please cite our paper:

@inproceedings{zhang2025d3qe,
  title={D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection},
  author={Zhang, Yanran and Yu, Bingyao and Zheng, Yu and Zheng, Wenzhao and Duan, Yueqi and Chen, Lei and Zhou, Jie and Lu, Jiwen},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16292--16301},
  year={2025}
}

Contact

For questions, issues, or reproducibility requests, please open an issue on this repository or contact the authors (PRs and issues are welcome), or reach out to: zhangyr21@mails.tsinghua.edu.cn

About

[ICCV 2025] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published