Skip to content

giorgosrap/Thesis-Multimodal-Detection-of-Intoxicated-Individuals-from-Alcohol-Use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Instructions

Installation

Environments Used (with Important Libraries)

DL Environment

Python 3.10.14

  • deep_audio_features 0.2.18
  • librosa 0.10.2.post1
  • pyAudioAnalysis 0.3.14
  • tensorflow 2.10.1
  • pandas 2.2.2
  • numpy 1.26.4

Transformers Environment

Python 3.10.16

  • accelerate 1.2.1
  • pytorchvideo 0.1.5
  • scikit-image 0.24.0
  • torchaudio 2.5.1
  • torchvision 0.20.1
  • transformers 4.48.0
  • datasets 2.14.5
  • fsspec 2023.6.0
  • torch 2.5.1
  • evaluate 0.4.3
  • pandas 2.2.3
  • numpy 1.26.4

CV2 Environment

Python 3.7.12
Help resources:

Dependencies:

  • cmake 3.30.3
  • dlib 19.12.0
  • opencv-contrib-python 4.10.0.84
  • numpy 1.21.6

Pyannote Environment

Python 3.6.13
References:

Dependencies:

  • cmake 3.28.4
  • mtcnn 0.1.1
  • pyannote.video 1.6.3
  • ffmpeg 2.7
  • numpy 1.19.2
  • pandas 1.1.5

Project Structure

To run the project end-to-end, follow the order below.
Detailed installation instructions for each component are provided later.


1.Dataset Creation

Use the gsoc18_RedHenLab folder to create datasets from raw videos into usable small segments.
Ensure proper configuration of:


gsoc18_RedHenLab/video_processing_pipeline/stages.py

Once configured, the pipeline should run without issues.

For additional help: GSoC18 RedHenLab Video Processing Pipeline


2️.Script Execution Order

All scripts are located in the Scripts directory.
Follow this order for execution:

1. video_segmentation.py

Outputs video segments (~10 seconds).
Example:

python 1.video_segmentation.py -c sober -o Scripts/test

2. fetch_audio_to_video.py

Joins audio and video files.

Example:

python 2.fetch_audio_to_video.py \
  -Vsrc "..\VIDEO_DATA\Segment_Output\drunk" \
  -Ssrc "..\gsoc18_RedHenLab\video_processing_pipeline\4_face_cropping\audio_drunk_output" \
  -Sdest "Scripts/test"

3. audio_segmentation.py

Outputs audio segments (~10 seconds). Example:

python 3.audio_segmentation.py -c drunk

4. Train_Val_Split_Frames_Visualizer.ipynb

Use the training split block to create a 70% train / 30% test data split.

5. video_to_sound_splits.py

Reorders audio segments based on the train-test split CSV. Example:

python 5.video_to_sound_splits.py \
  -csv "Thesis/CSVs/Sound_Video_Train_Tests/video_sound_train_val_split_26_processed_v3.csv" \
  -d "Thesis/Sound_For_Training" \
  --drunk_dir "Thesis/Segment_Output_Audio_Test/drunk" \
  --sober_dir "Thesis/Segment_Output_Audio_Test/sober"

The remining scripts in the directory are helper modules used for training or utility functions.


3️.Notebook Execution

All notebooks are located in the Notebooks directory:

  • LSTM_Model.ipynb Trains an LSTM model using VGG-extracted features. Run the extractor first:

    Scripts/VGG_feature_extractor.py  (dl environment)
    
  • Ts_Audio.ipynb Extracts CNN and MFCC features to train CNN and DNN-MFCC models using TensorFlow-Keras. (dl environment)

  • Video_Transformer.ipynb & Audio_Transformer.ipynb Contain the full code to train video and audio transformers. Can be run locally or on Google Colab. (transformers environment)

  • Predictions.ipynb & Late_Fusion.ipynb Generate validation predictions and combine multimodal results. (transformers environment)


4️.Required Environments per Script

Script / Notebook Environment
video_segmentation.py pyannote
fetch_audio_to_video.py pyannote
audio_segmentation.py pyannote
Train_Val_Split_Frames_Visualizer.ipynb dl
video_to_sound_splits.py pyannote
VGG_feature_extractor.py dl

Dataset Creation

To create the dataset, run the scripts in the gsoc18_RedHenLab folder. Follow the detailed instructions here: GSoC18 RedHenLab Pipeline Guide

⚠️ Note: The original codebase is relatively old. Following the instructions exactly may lead to dependency conflicts. To resolve these issues:

  • Create one environment with Python 3.6
  • Create another with Python 3.7 (required for OpenCV installation, especially on Windows)

Summary

  • Use pyannote for segmentation tasks.
  • Use dl for deep learning (VGG, LSTM, CNN).
  • Use transformers for model training and inference.
  • Ensure dataset generation is completed before running notebooks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors