This repository contains the code and resources for a speaker diarization project. Speaker diarization is the task of segmenting an audio recording into homogeneous regions based on speaker identities. This project aims to develop a speaker diarization for two speakers using unsupervised deep learning, in the form of neural network and clustering.
The repository is organized as follows:
data/: This directory contains the audio data used for training and evaluation.src/: This directory contains the source code for the speaker diarization system. It includes the following classes and their functionalities:Diarization: Applying speaker diarization on data, with several optionsVisualization: Generate a diarization plot and .mp4 clip of animation of the speakers by time with the audio.noise clean: Noise cleaning filters to be applied on audio.help_func: Utility functions for diarization
pretrained_models/: Pretrained models or checkpoints that can be used for inference or fine-tuning. It is recommended to download from Hugging Face.utils/: Utility scripts and helper functions used throughout the project.LICENSE: The license file for this project.README.md: This file, providing an overview of the project and instructions for getting started.
To get started with this project, follow the steps below:
-
Clone the repository:
git clone https://github.com/n242/HBL.git -
Make sure you have python 3.8+ installed
-
Install the required dependencies. Please refer to the
requirements.txtfile for the list of dependencies. You can install them using the following command:pip install -r requirements.txt -
Download or prepare the audio data for diarization and evaluation. Place the data in the
data/directory following the instructions provided in thedata/README.mdfile. -
Train a model or download one from Hugging Face (see instructions below)
-
Add your input path to main.py and run the speaker diarization code, visualize the data.
- visit hf.co/pyannote/speaker-diarization and hf.co/pyannote/segmentation and accept user conditions (only if requested)
- visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
- Insert your auth_token to main.py.
You can view an example output of an interview between two speakers.
And the full video at: example interview
When generating the .csv we assume the interviwer opens the interview.
Contributions to this project are welcome. If you encounter any issues or have suggestions for improvements, please feel free to submit an issue or a pull request.
This project is licensed under the MIT License.
We relied upon the following:
Neta Oren and Faisal Omari This project was part of the Computational Research of Human Behavior course as part of their M.Sc and B.Sc. in Computer Science at the University of Haifa. The project was performed under the supervision of Prof. Hagit Hel-Or.
