Skip to content

colossyan/syncnet-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SyncNet

Fork Notice: This is a fork of the original SyncNet repository by Joon Son Chung. This fork is maintained by Colossyan.

This repository contains the demo for the audio-to-video synchronisation network (SyncNet). This network can be used for audio-visual synchronisation tasks including:

  1. Removing temporal lags between the audio and visual streams in a video;
  2. Determining who is speaking amongst multiple faces in a video.

Please cite the paper below if you make use of the software.

Installation

Option 1: Install from GitHub (Recommended)

pip install git+https://github.com/colossyan/syncnet-python.git

Option 2: Install from source

git clone https://github.com/colossyan/syncnet-python.git
cd syncnet-python
pip install -e .

Dependencies

pip install -r requirements.txt

In addition, ffmpeg is required.

Usage

As a Python Package

from syncnet_python import SyncNetInstance

# Initialize SyncNet
syncnet = SyncNetInstance()

# Load pre-trained model
syncnet.loadParameters('path/to/pretrained_model.pth')

# Evaluate a video
class Args:
    def __init__(self):
        self.tmp_dir = '/tmp/syncnet'
        self.reference = 'test_video'
        self.batch_size = 20
        self.vshift = 10

opt = Args()
offset, conf, dists = syncnet.evaluate(opt, 'path/to/video.mp4')
print(f"Audio-Video offset: {offset}")
print(f"Confidence: {conf}")

Command Line Usage

SyncNet demo:

python demo_syncnet.py --videofile data/example.avi --tmp_dir /path/to/temp/directory

Check that this script returns:

AV offset:      3 
Min dist:       5.353
Confidence:     10.021

Full pipeline:

sh download_model.sh
python run_pipeline.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
python run_syncnet.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output
python run_visualise.py --videofile /path/to/video.mp4 --reference name_of_video --data_dir /path/to/output

Outputs:

$DATA_DIR/pycrop/$REFERENCE/*.avi - cropped face tracks
$DATA_DIR/pywork/$REFERENCE/offsets.txt - audio-video offset values
$DATA_DIR/pyavi/$REFERENCE/video_out.avi - output video (as shown below)

Publications

@InProceedings{Chung16a,
  author       = "Chung, J.~S. and Zisserman, A.",
  title        = "Out of time: automated lip sync in the wild",
  booktitle    = "Workshop on Multi-view Lip-reading, ACCV",
  year         = "2016",
}

About

Fork of SyncNet: Audio-to-video synchronisation network for lip sync and speaker identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •