[ICCV 2025] D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Created by Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu
- Introduction
- News 🔥
- Quick Start
- Setup 🔧
- Dataset
- Training
- Evaluation
- Pretrained Models
- Acknowledgments
- Citation
- Contact
D³QE is a detection method designed to identify images generated by visual autoregressive (AR) models. The core idea is to exploit discrete distribution discrepancies and quantization error patterns produced by tokenized autoregressive generation. Key highlights:
- Integrates dynamic codebook frequency statistics into a transformer attention module.
- Fuses semantic image features with latent representations of quantization/quantizer error.
- Demonstrates strong detection accuracy, cross-model generalization, and robustness to common real-world perturbations.
This repo contains the code, dataset, and scripts used in the paper to facilitate reproducible experiments.
- 🆕 2025-10-09 — Our code is released.
- 🆕 2025-10-08 — arXiv preprint released.
- 🆕 2025-07-23 — Accepted to ICCV 2025!🔥
- Clone the repository:
git clone https://github.com/Zhangyr2022/D3QE
cd D3QE- Create the environment and install dependencies:
conda create -n D3QE python=3.11 -y
conda activate D3QE
pip install -r requirements.txt
# If you have GPU(s), ensure CUDA and PyTorch are installed correctly for your environment.-
Download the dataset (see Dataset below) and place it under
./data/ARForensics(or a path you prefer). Download the pretrained LlamaGen vqvae model vq_ds16_c2i.pt from LlamaGen and place it under./pretrained. -
Train a model:
bash train.sh- Evaluate:
bash eval.shWe provide the ARForensics benchmark — the first large-scale dataset specifically for visual autoregressive model detection. 7 Autoregressive models included (diverse token/scale architectures): LlamaGen, VAR, Infinity, Janus-Pro, RAR, Switti, and Open-MAGVIT2.
Splits:
- Training: 100k LlamaGen images + 100k ImageNet images
- Validation: 10k LlamaGen images + 10k ImageNet images
- Test: balanced test set with 6k samples per model
Download: The dataset ARForensics is uploaded and available at: 🤗 HuggingFace | 🤖 ModelScope.
Folder structure (expected):
ARForensics/
├─ train/
│ ├─ 0_real/
│ └─ 1_fake/
├─ val/
│ ├─ 0_real/
│ └─ 1_fake/
└─ test/
├─ Infinity/
│ ├─ 0_real/
│ └─ 1_fake/
├─ Janus_Pro/
│ ├─ ..
├─ RAR/
├─ Switti/
├─ VAR/
├─ LlamaGen/
└─ Open_MAGVIT2/
A provided training script train.sh wraps the typical training pipeline. You can tweak the hyper-parameters directly in the script or by editing the training config file used by the codebase. We train the model on a single GPU by default. (24GB GPU memory recommended)
Example:
bash train.sh
# or run the training entrypoint directly, e.g.
python train.py \
--name D3QE_rerun \
--dataroot /path/to/your/dataset \
--detect_method D3QE \
--blur_prob 0.1 \
--blur_sig 0.0,3.0 \
--jpg_prob 0.1 \
--jpg_method cv2,pil \
--jpg_qual 30,100 \eval.py exposes many options to evaluate detection performance and robustness.
usage: eval.py [-h] [--rz_interp RZ_INTERP] [--batch_size BATCH_SIZE]
[--loadSize LOADSIZE] [--CropSize CROPSIZE] [--no_crop]
[--no_resize] [--no_flip] [--robust_all]
[--detect_method DETECT_METHOD] [--dataroot DATAROOT]
[--sub_dir SUB_DIR] [--model_path MODEL_PATH]
Key flags:
--batch_size(default: 64)--loadSize/--CropSizefor image preprocessing (defaults: 256 / 224)--robust_allto evaluate model robustness across different noises/attacks--sub_dirlist of subfolders in the test set (defaults to the 7 AR models)--model_pathpath to your trained model checkpoint
Example (evaluate D³QE):
There's an eval.sh with default settings you can adapt.
bash eval.sh
# or run evaluation directly
python eval.py \
--model_path /your/model/path \
--detect_method D3QE \
--batch_size 1 \
--dataroot /path/to/your/testset \
--sub_dir '["Infinity","Janus_Pro","RAR","Switti","VAR","LlamaGen","Open_MAGVIT2"]'Pretrained model checkpoints are uploaded at: 🤗 Hugging Face
This codebase builds on and borrows design patterns from:
Thanks to the authors of those projects for making their code and models available.
If you use this repository or dataset in your research, please cite our paper:
@inproceedings{zhang2025d3qe,
title={D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection},
author={Zhang, Yanran and Yu, Bingyao and Zheng, Yu and Zheng, Wenzhao and Duan, Yueqi and Chen, Lei and Zhou, Jie and Lu, Jiwen},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={16292--16301},
year={2025}
}For questions, issues, or reproducibility requests, please open an issue on this repository or contact the authors (PRs and issues are welcome), or reach out to: zhangyr21@mails.tsinghua.edu.cn


