DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration

Yan Chen^1* Hanlin Shang^1* Ce Liu¹ Yuxuan Chen¹ Hui Li¹ Weihao Yuan²

Hao Zhu³ Zilong Dong² Siyu Zhu^1✉️

¹Fudan University ²Alibaba Group ³Nanjing University

ICCV 2025 Highlight

supplementary-Compressed.mp4

🖼️ Showcase

Blind Face Restoration

bfr-1.mp4

bfr-2.mp4

Face Inpainting

inpainting-1.mp4

inpainting-2.mp4

Face Colorization

color-1.mp4

color-2.mp4

🐾 Wild Data Examples

01_segment1.mp4

02_segment1.mp4

6.17.mp4

resotred_face.mp4

05_seg.mp4

📰 News

2025/07/25: 🎉🎉🎉 Our paper has been accepted to ICCV 2025and selected as a highlight.
2025/06/26: 🎉🎉🎉 Our paper has been accepted to ICCV 2025.
2025/06/25: Release our test data on huggingface repo.
2025/06/23: Release our pretrained model on huggingface repo.
2025/06/17: Paper submitted on Arixiv. paper
2025/06/16: 🎉🎉🎉 Release inference scripts

📅️ Roadmap

Status	Milestone	ETA
✅	Inference Code release	2025-6-16
✅	Model Weight release， baidu-link	2025-6-16
✅	Paper submitted on Arixiv	2025-6-17
✅	Test data release	2025-6-25
✅	Training Code release	2025-6-26

⚙️ Installation

System requirement: PyTorch version >=2.4.1, python == 3.10
Tested on GPUs: A800, python version == 3.10, PyTorch version == 2.4.1, cuda version == 12.1

Download the codes:

  git clone https://github.com/fudan-generative-vision/DicFace
  cd DicFace

Create conda environment:

  conda create -n DicFace python=3.10
  conda activate DicFace

Install PyTorch

  conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia

Install packages with pip

  pip install -r requirements.txt
  python basicsr/setup.py develop
  conda install -c conda-forge dlib

📥 Download Pretrained Models

The pre-trained weights have been uploaded to Baidu Netdisk. Please download them from the link

Now you can easily get all pretrained models required by inference from our HuggingFace repo.

File Structure of Pretrained Models The downloaded .ckpts directory contains the following pre-trained models:

.ckpts
|-- CodeFormer                  # CodeFormer-related models
|   |-- bfr_100k.pth            # Blind Face Restoration model 
|   |-- color_100k.pth          # Color Restoration model 
|   |-- codeformer.pth          # codeformer model
|   |-- vqgan_discriminator.pth # vqgan_discriminator model
|   `-- inpainting_100k.pth     # Image Inpainting model
|-- dlib                        # dlib face-related models
|   |-- mmod_human_face_detector.dat  # Human face detector
|   `-- shape_predictor_5_face_landmarks.dat  # 5-point face landmark predictor
|-- facelib                     # Face processing library models
|   |-- detection_Resnet50_Final.pth  # ResNet50 face detector 
|   |-- detection_mobilenet0.25_Final.pth  # MobileNet0.25 face detector 
|   |-- parsing_parsenet.pth    # Face parsing model
|   |-- yolov5l-face.pth        # YOLOv5l face detection model
|   `-- yolov5n-face.pth        # YOLOv5n face detection model
|-- realesrgan                  # Real-ESRGAN super-resolution model
|   `-- RealESRGAN_x2plus.pth   # 2x super-resolution enhancement model
`-- vgg                         # VGG feature extraction model
    `-- vgg.pth                 # VGG network pre-trained weights

🎮 Run Inference

for blind face restoration

python scripts/inference.py \
		-i /path/to/video \
		-o /path/to/output_folder \
		--max_length 10 \
		--save_video_fps 24 \
		--ckpt_path /bfr/bfr_weight.pth \
		--bg_upsampler realesrgan \
		--save_video 

# or your videos has been aligned
python scripts/inference.py \
		-i /path/to/video \
		-o /path/to/output_folder \
		--max_length 10 \
		--save_video_fps 24 \
		--ckpt_path /bfr/bfr_weight.pth \
		--save_video \
		--has_aligned

for colorization & inpainting task

The current colorization & inpainting tasks only supports input of aligned faces. If a non-aligned face is input, it may lead to unsatisfactory final results.

# for colorization task
python scripts/inference_color_and_inpainting.py \
		-i /path/to/video_warped \
		-o /path/to/output_folder \
		--max_length 10 \
		--save_video_fps 24 \
		--ckpt_path /colorization/colorization_weight.pth \
		--bg_upsampler realesrgan \
		--save_video \
		--has_aligned

# for inpainting task
python scripts/inference_color_and_inpainting.py \
		-i /path/to/video_warped \
		-o /path/to/output_folder \
		--max_length 10 \
		--save_video_fps 24 \
		--ckpt_path /inpainting/inpainting_weight.pth \
		--bg_upsampler realesrgan \
		--save_video \
		--has_aligned

Test Data

Our test data can be accessed via the following links:

Baidu Netdisk: https://pan.baidu.com/s/1zMp3fnf6LvlRT9CAoL1OUw (Password: drhh)
Hugging Face Dataset: https://huggingface.co/datasets/fudan-generative-ai/DicFace-test_dataset

Directory Structure

The downloaded test_data_set directory contains the following folders:

./test_data
├── LR_Blind                  # Blind face restoration test image folders
│   ├── Clip+_HebIzK_LP4+P2+C1+F16589-16715
│   ├── ...                   # Additional test image folders
│   └── Clip+y5OFsRIRkwc+P0+C0+F9797-9938
│
├── TEST_DATA                 # Ground-truth (GT) image folders
│   ├── Clip+_HebIzK_LP4+P2+C1+F16589-16715
│   ├── ...
│   └── Clip+y5OFsRIRkwc+P0+C0+F9797-9938
│
├── vfhq_test_color_input     # Colorization test image folders
│   ├── Clip+_HebIzK_LP4+P2+C1+F16589-16715
│   ├── ...
│   └── Clip+y5OFsRIRkwc+P0+C0+F9797-9938
│
├── vfhq_test_inpaint_input_512  # Inpainting test image folders (512x512)
│   ├── Clip+_HebIzK_LP4+P2+C1+F16589-16715
│   ├── ...
│   └── Clip+y5OFsRIRkwc+P0+C0+F9797-9938
│
└── vfhq_test_landmarks       # Facial landmark files for warping operations

Usage

To process the test data, use the warp_images.py script:

python scripts/warp_images.py \
    -i input_test_data_folder \
    -o vfhq_test_inpaint_input_512_warped \
    -l /path/to/test_data_folder/vfhq_test_landmarks

After warping the test data, you can use the inference scripts to generate results for the test dataset.

Training

Training Data

We utilize the VFHQ dataset for both training and testing. The test data is specifically sourced from VFHQ-Test. For more details, please refer to the official project page: VFHQ.

Prerequisites for Training

Before initiating the training process, ensure that you have completed the following steps:

Image Size Requirement:
- All input images must be resized to 512 x 512 pixels.
Download Necessary Files:
- Obtain the metadata files and facial landmark information from our Hugging Face repository. [TBD(not ready)]
Configure YAML Files:
- Edit the configuration file located at options/xxx.yaml to specify your training parameters and dataset paths.

Initiate Training

Once the prerequisites are met, start the training process by executing the following command:

bash train.sh

This script will initiate the training procedure using the settings defined in your YAML configuration file.

🤗 Acknowledgements

This project is open sourced under NTU S-Lab License 1.0. Redistribution and use should follow this license. The code framework is mainly modified from CodeFormer. Please refer to the original repo for more usage and documents.

📝 Citation

If you find our work useful for your research, please consider citing the paper:

@misc{chen2025dicfacedirichletconstrainedvariationalcodebook,
      title={DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration}, 
      author={Yan Chen and Hanlin Shang and Ce Liu and Yuxuan Chen and Hui Li and Weihao Yuan and Hao Zhu and Zilong Dong and Siyu Zhu},
      year={2025},
      eprint={2506.13355},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.13355}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration

🖼️ Showcase

Blind Face Restoration

Face Inpainting

Face Colorization

🐾 Wild Data Examples

📰 News

📅️ Roadmap

⚙️ Installation

📥 Download Pretrained Models

🎮 Run Inference

for blind face restoration

for colorization & inpainting task

Test Data

Directory Structure

Usage

Training

Training Data

Prerequisites for Training

Initiate Training

🤗 Acknowledgements

📝 Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration

🖼️ Showcase

Blind Face Restoration

Face Inpainting

Face Colorization

🐾 Wild Data Examples

📰 News

📅️ Roadmap

⚙️ Installation

📥 Download Pretrained Models

🎮 Run Inference

for blind face restoration

for colorization & inpainting task

Test Data

Directory Structure

Usage

Training

Training Data

Prerequisites for Training

Initiate Training

🤗 Acknowledgements

📝 Citation