Skip to content

langzizhixin/wav2lip384x384

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 wav2lip384x384 is ours LangXin_V1

This is a project about talking faces. We use384X384 sized facial images for training, which can generate720p, 1080p, 2k ,4k Digital Humanhuman videos. We have done the following work:

  1. Add video cutting codes.
  2. Add filelists to generate code.
  3. Trained 1000 people, 50 hours, and over 50000 pieces of data.
  4. Open sourced the checkpoint for a discriminator with 150000, 700000, 1000000 steps and a val_loss of 0.36, 0.33, 0.28.
  5. open-source the checkpoint for generator with 300000---800000 steps, with val_loss values of 0.35---0.29 , performs very well and is recommended for use. Of course, it can also be loaded for further training.
  6. Dear friends, generators with over 500000 steps have surpassed all open source projects on the market in terms of direct inference performance, and have reached a basic commercial level.
  7. Dear friends, we released the best discriminator checkpoint, you need load pre training weights for easy subsequent training, many people have loaded our color_checkpoints and final_checkpionts for training, and achieved good results.Especially when solving profile and occlusion problems, it is only necessary to load the relevant dataset and continue training.
  8. Due to the wav2lip high-definition algorithm series, it cannot achieve high fidelity of faces and teeth, and the training difficulty is relatively high, which cannot adapt well to current commercial needs. So we have changed the algorithm for commercial digital humans and adopted new algorithms such as diffusion.
  9. Friends who want to train the wav2lip high-definition series, please think carefully before taking action.
  10. If you want to achieve better reasoning results, then refer to my demo video for shooting.

🏗️ wav2lip-384x384 Project situation

Video | Project Page | Code

checkpoints for wav2lip384x384 https://pan.baidu.com/s/1NiSEdrlRVZM_6SD4Igdtlg?pwd=lzzx

📊 The following pictures are comparison images of the training generator training 500000 steps.

🎬 Demo

Original video Lip-synced video
input-001.mp4
output-001.mp4
input-002.mp4
output-002.mp4
input-003.mp4
output-003.mp4
input-004.mp4
output-004.mp4

📑 Open-source Plan

For the wav2lip series, we will continue to train and release higher definition weights in the future. The plan is as follows: Pre training checkpoints for wav2lip_288x288 will be released in January 2025. Pre training checkpoints for wav2lip_384x384 will be released in February 2025. Pre training checkpoints for wav2lip_576x576 or 512x512 will be released after June 2025.

  • color_checkpoints
  • final_checkpionts
  • Dataset processing pipeline
  • Training method
  • Advanced Inference
  • Real time Inference
  • Higher definition commercial checkpoints

🙏 Citing

Thank you to the other three authors, Thank you for their wonderful work.

https://github.com/primepake/wav2lip_288x288

https://github.com/nghiakvnvsd/wav2lip384

https://github.com/Rudrabha/Wav2Lip

📖 Disclaimers

This repositories made by langzizhixin from Langzizhixin Technology company 2025.1.30 , in Chengdu, China . The above code and weights can only be used for personal/research/non-commercial purposes. Especially for digital human video models in the warehouse, if commercial use is required, please contact the model themselves for authorization. If you need a higher definition model, please contact us by email [email protected], [email protected], [email protected] or add ours WeChat for communication: langzizhixinkeji

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published