The input of the file is revieved in .mp3 format
The file is converted to .wav format
An .rttm file is generated using 'pyannote/[email protected]' using hugging face to generate the timestamps of each speaker
This then converted to a csv format
Then the audio files are generated for those timestamps
The text is extracted from each audio file, speaker-wise
The overall summary of the file is then obtained using hugging face's "/knkarthick/MEETING_SUMMARY"
It is easy to execute the code in google colab by just uploading the .mp3 file and running all the cells
In CPU it took 45 minutes to execute the entire code for the audio uploaded
In GPU it took less than a minute to execute the entire code for the audio uploaded