Skip to content

LuizScarlet/StableCodec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

27 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

icon icon python pytorch visitors

Tianyu Zhang, Xin Luo, Li Li, Dong Liu

University of Science and Technology of China

โญ If StableCodec is helpful to you, please star this repo. Thanks! ๐Ÿค—

โŒ› Updates

[2026/03/11] ๐Ÿ˜ฎ We propose Asymmetric Extreme Image Compression (AEIC), an improved version of StableCodec exploring the feasibility of applying shallow encoders for ultra-low bitrate image compression. Check it out!
[2025/12/30] Release all source code. Leave it in 2025!
[2025/12/29] Release additional checkpoints for training and inference.
[2025/08/21] Training logs and reported results are now available, see results/.

โŒ› TODO

  • Repo release
  • Update paper link
  • Demo
  • Pretrained models
  • Inference
  • Training

๐Ÿ“ Abstract

Diffusion-based image compression has shown remarkable potential for achieving ultra-low bitrate coding (less than 0.05 bits per pixel) with high realism. However, current approaches: (1) Require a large number of denoising steps at the decoder to generate realistic results under extreme bitrate constraints. (2) Sacrifice reconstruction fidelity, as diffusion models typically fail to guarantee pixel-level consistency. To address these challenges, we introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression with improved coding efficiency. To achieve ultra-low bitrates, we first develop an efficient Deep Compression Latent Codec to transmit a noisy latent representation for a single-step denoising process. We then propose a Dual-Branch Coding Structure, consisting of a pair of auxiliary encoder and decoder, to enhance reconstruction fidelity. Furthermore, we adopt end-to-end optimization with joint bitrate and pixel-level constraints. StableCodec outperforms existing methods in terms of FID, KID and DISTS by a significant margin, even at bitrates as low as 0.005 bits per pixel, while maintaining (1) strong fidelity and (2) inference speeds comparable to mainstream transform coding schemes.

๐Ÿ˜ Main Results

Rate-distortion-perception comparison on benchmarks:

Compressing high-resolution images for more than 1000 times:

โš™ Installation

conda create -n stablecodec python=3.10
conda activate stablecodec
pip install -r requirements.txt

โšก Inference

Step 1: Prepare your datasets for inference

<PATH_TO_DATASET>/*.png

In our paper, we adopt the following test datasets:

Step 2: Download pretrained models

  1. Download SD-Turbo.
  2. Download checkpoints for StableCodec and Auxiliary Encoder (ELIC):
--- List ---
stablecodec_base.pkl		# A base model for Stage 2 finetuning
stablecodec_ft2.pkl			# ~ 0.035bpp on Kodak
stablecodec_ft3.pkl			# ~ 0.029bpp on Kodak
stablecodec_ft4.pkl			# ~ 0.025bpp on Kodak
stablecodec_ft6.pkl			# ~ 0.020bpp on Kodak
stablecodec_ft8.pkl			# ~ 0.017bpp on Kodak
stablecodec_ft12.pkl		# ~ 0.013bpp on Kodak
stablecodec_ft16.pkl		# ~ 0.010bpp on Kodak
stablecodec_ft24.pkl		# ~ 0.008bpp on Kodak
stablecodec_ft32.pkl		# ~ 0.005bpp on Kodak
elic_official.pth			# Pretrained ELIC model for Auxiliary Encoder

Step 3: Inference for StableCodec

Please modify the paths in compress.sh:

python src/compress.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --img_path="<PATH_TO_DATASET>/" \
    --rec_path="<PATH_TO_SAVE_OUTPUTs>/rec/" \
    --bin_path="<PATH_TO_SAVE_OUTPUTs>/bin/" \
    --codec_path="<PATH_TO_STABLECODEC>/stablecodec_ft2.pkl" \
    # --color_fix

Note: Color fix is recommended when inferring high-resolution images with tiling (e.g., DIV2K, CLIC 2020).

Then run:

bash compress.sh

You may find your bitstreams in specified bin_path and reconstructions in rec_path ๐Ÿค—.

๐Ÿญ Evaluation (Optional)

Run the evaluation script to compute reconstruction metrics with src/evaluate.py

bash eval_folders.sh

Please make sure recon_dir and gt_dir are specified.

๐Ÿ”ฅ Training

Preparations

We perform lightweight training on 2x RTX 3090 (24G) GPUs. Consider adjusting train_batch_size and gradient_accumulation_steps in src/my_utils/training_utils.py for faster or better training performance.

Our training data includes:

  • Flickr2K: Contains 2560 2K-resolution images.
  • DIV2K Training Set: Contains 800 2K-resolution images.
  • CLIC: Contains 585 (CLIC 2020 Training) + 41 (CLIC 2020 Validation) + 60 (CLIC 2021 Test) 2K-resolution images.

We use h5py to organize training data. To construct a .hdf5 training file, please refer to src/my_utils/build_h5.py.

Note: An empirical finding suggests adding additional training data in Stage 1 improves stability. We adopt the first 10K images from LSDIR.

Stage 1: Train a base model with relaxed bitrates

Note: You may skip Stage 1 with our pretrained stablecodec_base.pkl.

Please modify the paths in train.sh:

accelerate launch --num_processes=2 --gpu_ids="0,1," --main_process_port 29300 src/train.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --train_dataset="<PATH_TO_DATASET>/dataset.hdf5" \
    --test_dataset="<PATH_TO_DATASET>/Kodak/" \
    --output_dir="<PATH_TO_SAVE_OUTPUTS>/" \
    --max_train_steps 120000 \
    --lambda_rate 0.5

Then run:

bash train.sh

Stage 2: Finetune the base model with GAN and target extreme bitrates

Please modify the paths in finetune.sh:

accelerate launch --num_processes=2 --gpu_ids="0,1," --main_process_port 29300 src/finetune.py \
    --sd_path="<PATH_TO_SD_TURBO>/sd-turbo" \
    --elic_path="<PATH_TO_ELIC>/elic_official.pth" \
    --codec_path="<PATH_TO_STABLECODEC>/stablecodec_base.pkl" \
    --train_dataset="<PATH_TO_DATASET>/dataset.hdf5" \
    --test_dataset="<PATH_TO_DATASET>/Kodak/" \
    --output_dir="<PATH_TO_SAVE_OUTPUTS>/" \
    --max_train_steps 21000 \
    --lambda_rate 2 # [2, 3, 4, 6, 8, 12, 16, 24, 32]

Then run:

bash finetune.sh

๐Ÿ“– Citation

If you find our work inspiring, please consider citing:

@InProceedings{Zhang_2025_ICCV,
    author    = {Zhang, Tianyu and Luo, Xin and Li, Li and Liu, Dong},
    title     = {StableCodec: Taming One-Step Diffusion for Extreme Image Compression},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {17379-17389}
}

๐Ÿ““ License

This work is licensed under MIT license.

๐Ÿฅฐ Acknowledgement

This work is implemented based on CompressAI, ELIC-Unofficial, StableSR, StableDiffusion and DCVC. Thanks for their awesome work!

โœ‰๏ธ Contact

If you have any questions, please feel free to drop me an email:

  • zhangtianyu[at]mail.ustc.edu.cn

About

[ICCV 2025] StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages