Pytorch implementation for Invertible Guided Consistency Training (iGCT) (https://arxiv.org/abs/2502.05391).
We propose invertible Guided Consistency Training (iGCT), a diffusion independent approach that integrates guidance and inversion into Consistency Models. iGCT is a fully data-driven algorithm that enables fast guidance generation, fast inversion, and fast editing.
To set up the environment, follow these steps:
python -m venv igct
source igct/bin/activateconda create -n igct python=3.9 -y
conda activate igctpip install click requests pillow numpy scipy psutil tqdm imageio scikit-image imageio-ffmpeg pyspng
pip install torch==2.3.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
pip install lpips transformersEnsure a GPU is available for dataset preparation. Follow the instructions below to download and prepare the datasets.
- Download the CIFAR-10 python version.
- Convert the dataset to a ZIP archive:
mkdir downloads && cd downloads mkdir cifar10 && cd cifar10 curl -O https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz cd ../.. python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz --dest=datasets/cifar10-32x32.zip python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz --dest=datasets/cifar10-32x32-test.zip --testset=true
- Compute FID statistics:
python fid.py ref --data=datasets/cifar10-32x32.zip --dest=fid-refs/cifar10-32x32.npz
- Download the ImageNet Object Localization Challenge.
- Convert the dataset to a ZIP archive at 64x64 resolution:
python dataset_tool.py --source=downloads/imagenet/ILSVRC/Data/CLS-LOC/train \ --dest=datasets/imagenet-64x64.zip --resolution=64x64 --transform=center-crop - Organize the ImageNet validation set directory:
python organize_imagenet_dataset.py --annote_dir downloads/imagenet/ILSVRC/Annotations/CLS-LOC/val --images_dir downloads/imagenet/ILSVRC/Data/CLS-LOC/val
- Convert the validation set to a ZIP archive:
python dataset_tool.py --source=downloads/imagenet/ILSVRC/Data/CLS-LOC/val --dest=datasets/imagenet-64x64-val.zip --resolution=64x64 --transform=center-crop
- Compute FID statistics:
python fid.py ref --data=datasets/imagenet-64x64.zip --dest=fid-refs/imagenet-64x64.npz
- Create ImageNet subgroups for image editing:
python create_imagenet_editing_subgroups.py --dataset_path datasets/imagenet-64x64-val.zip --save_dir datasets/imagenet-64x64-editing-subgroups
Example scripts for training and evaluation can be found in the ./scripts directory.
Pre-trained model checkpoints are available for download. The checkpoints are organized as follows:
model_weights/
├── cifar10/
│ ├── baselines/
│ │ ├── cfg-edm/ # Checkpoints for CFG-EDM baseline on CIFAR-10
│ │ └── guided-cd/ # Checkpoints for Guided-CD baseline on CIFAR-10
│ └── igct/
│ └── igct/ # Checkpoints for iGCT on CIFAR-10
│
└── imagenet/
├── baselines/
│ └── cfg-edm/ # Checkpoints for CFG-EDM baseline on ImageNet
└── igct/
└── igct/ # Checkpoints for iGCT on ImageNet
Download the checkpoints from the model_weights Google Drive link.
- CUDA Release Version: 12.1
- NVCC Compiler Version: V12.1.105
- Distribution: Red Hat Enterprise Linux (RHEL)
- Version: 9.2
- Codename: Plow
- Release: 9.2
- CIFAR-10: Cluster of A100 (40GB), ~84 GPU days.
- Imagenet64: Cluster of A100 (80GB) ~190 GPU days.
For questions, feedback, or collaboration opportunities, feel free to reach out via email at [email protected] or [email protected]
If you find this work useful, please consider citing our paper:
@misc{hsu2025freediffusioninvertibleguided,
title={Beyond and Free from Diffusion: Invertible Guided Consistency Training},
author={Chia-Hong Hsu and Shiu-hong Kao and Randall Balestriero},
year={2025},
eprint={2502.05391},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.05391},
}