Project Instructions

Installation

Environments Used (with Important Libraries)

DL Environment

Python 3.10.14

deep_audio_features 0.2.18
librosa 0.10.2.post1
pyAudioAnalysis 0.3.14
tensorflow 2.10.1
pandas 2.2.2
numpy 1.26.4

Transformers Environment

Python 3.10.16

accelerate 1.2.1
pytorchvideo 0.1.5
scikit-image 0.24.0
torchaudio 2.5.1
torchvision 0.20.1
transformers 4.48.0
datasets 2.14.5
fsspec 2023.6.0
torch 2.5.1
evaluate 0.4.3
pandas 2.2.3
numpy 1.26.4

CV2 Environment

Python 3.7.12
Help resources:

Dependencies:

cmake 3.30.3
dlib 19.12.0
opencv-contrib-python 4.10.0.84
numpy 1.21.6

Pyannote Environment

Python 3.6.13
References:

Dependencies:

cmake 3.28.4
mtcnn 0.1.1
pyannote.video 1.6.3
ffmpeg 2.7
numpy 1.19.2
pandas 1.1.5

Project Structure

To run the project end-to-end, follow the order below.
Detailed installation instructions for each component are provided later.

1.Dataset Creation

Use the gsoc18_RedHenLab folder to create datasets from raw videos into usable small segments.
Ensure proper configuration of:


gsoc18_RedHenLab/video_processing_pipeline/stages.py

Once configured, the pipeline should run without issues.

For additional help: GSoC18 RedHenLab Video Processing Pipeline

2️.Script Execution Order

All scripts are located in the Scripts directory.
Follow this order for execution:

1. `video_segmentation.py`

Outputs video segments (~10 seconds).
Example:

python 1.video_segmentation.py -c sober -o Scripts/test

2. `fetch_audio_to_video.py`

Joins audio and video files.

Example:

python 2.fetch_audio_to_video.py \
  -Vsrc "..\VIDEO_DATA\Segment_Output\drunk" \
  -Ssrc "..\gsoc18_RedHenLab\video_processing_pipeline\4_face_cropping\audio_drunk_output" \
  -Sdest "Scripts/test"

3. `audio_segmentation.py`

Outputs audio segments (~10 seconds). Example:

python 3.audio_segmentation.py -c drunk

4. `Train_Val_Split_Frames_Visualizer.ipynb`

Use the training split block to create a 70% train / 30% test data split.

5. `video_to_sound_splits.py`

Reorders audio segments based on the train-test split CSV. Example:

python 5.video_to_sound_splits.py \
  -csv "Thesis/CSVs/Sound_Video_Train_Tests/video_sound_train_val_split_26_processed_v3.csv" \
  -d "Thesis/Sound_For_Training" \
  --drunk_dir "Thesis/Segment_Output_Audio_Test/drunk" \
  --sober_dir "Thesis/Segment_Output_Audio_Test/sober"

The remining scripts in the directory are helper modules used for training or utility functions.

3️.Notebook Execution

All notebooks are located in the Notebooks directory:

LSTM_Model.ipynb Trains an LSTM model using VGG-extracted features. Run the extractor first:
```
Scripts/VGG_feature_extractor.py  (dl environment)
```
Ts_Audio.ipynb Extracts CNN and MFCC features to train CNN and DNN-MFCC models using TensorFlow-Keras. (dl environment)
Video_Transformer.ipynb & Audio_Transformer.ipynb Contain the full code to train video and audio transformers. Can be run locally or on Google Colab. (transformers environment)
Predictions.ipynb & Late_Fusion.ipynb Generate validation predictions and combine multimodal results. (transformers environment)

4️.Required Environments per Script

Script / Notebook	Environment
`video_segmentation.py`	pyannote
`fetch_audio_to_video.py`	pyannote
`audio_segmentation.py`	pyannote
`Train_Val_Split_Frames_Visualizer.ipynb`	dl
`video_to_sound_splits.py`	pyannote
`VGG_feature_extractor.py`	dl

Dataset Creation

To create the dataset, run the scripts in the gsoc18_RedHenLab folder. Follow the detailed instructions here: GSoC18 RedHenLab Pipeline Guide

⚠️ Note: The original codebase is relatively old. Following the instructions exactly may lead to dependency conflicts. To resolve these issues:

Create one environment with Python 3.6

Create another with Python 3.7 (required for OpenCV installation, especially on Windows)

Summary

Use pyannote for segmentation tasks.
Use dl for deep learning (VGG, LSTM, CNN).
Use transformers for model training and inference.
Ensure dataset generation is completed before running notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
CSVs/Final_Report		CSVs/Final_Report
Notebooks		Notebooks
Scripts		Scripts
gsoc18_RedHenLab		gsoc18_RedHenLab
.gitignore		.gitignore
DIF_Paper.pdf		DIF_Paper.pdf
Identities		Identities
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Instructions

Installation

Environments Used (with Important Libraries)

DL Environment

Transformers Environment

CV2 Environment

Pyannote Environment

Project Structure

1.Dataset Creation

2️.Script Execution Order

1. `video_segmentation.py`

2. `fetch_audio_to_video.py`

3. `audio_segmentation.py`

4. `Train_Val_Split_Frames_Visualizer.ipynb`

5. `video_to_sound_splits.py`

3️.Notebook Execution

4️.Required Environments per Script

Dataset Creation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Instructions

Installation

Environments Used (with Important Libraries)

DL Environment

Transformers Environment

CV2 Environment

Pyannote Environment

Project Structure

1.Dataset Creation

2️.Script Execution Order

1. video_segmentation.py

2. fetch_audio_to_video.py

3. audio_segmentation.py

4. Train_Val_Split_Frames_Visualizer.ipynb

5. video_to_sound_splits.py

3️.Notebook Execution

4️.Required Environments per Script

Dataset Creation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `video_segmentation.py`

2. `fetch_audio_to_video.py`

3. `audio_segmentation.py`

4. `Train_Val_Split_Frames_Visualizer.ipynb`

5. `video_to_sound_splits.py`

Packages