A tool for transcribing and summarizing meetings from video files or YouTube URLs. This application uses state-of-the-art AI models to perform speaker diarization (identifying who is speaking) and speech-to-text transcription, followed by automatic summarization of key points.
- Process video files or YouTube URLs
- Extract audio automatically
- Identify different speakers (diarization)
- Transcribe speech to text with Whisper models
- Generate meeting summaries with key points
- Support for Spanish and English languages
- Export transcriptions in SubViewer format (.sub)
- Python 3.10+
- CUDA-compatible GPU (recommended for faster processing)
- Hugging Face account with API token
-
Clone this repository:
git clone https://github.com/matiaszanolli/AI-Meeting-Summary-SPANISH.git cd AI-Meeting-Summary-SPANISH -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Create a .env file in the project root with your Hugging Face token:
HUGGINGFACE_AUTH_TOKEN=your_token_here
-
Start the web interface:
python web-ui.py
-
Open your browser at http://localhost:7860
-
Upload a video file or enter a YouTube URL
-
Configure the parameters:
- Select language (Spanish, English, or auto-detect)
- Choose Whisper model size (larger models are more accurate but slower)
- Adjust collar value for speaker diarization
- Enable/disable summary generation
-
Click "Iniciar" to start processing
-
View the transcription and summary in the respective tabs
- Speaker Diarization: pyannote/speaker-diarization-3.0
- Speech Recognition: OpenAI Whisper (various sizes)
- Summarization: Custom extractive summarization algorithm
- output.sub: Transcription in SubViewer format
- output_summary.txt: Meeting summary with key points
- output-tracks/: Directory containing audio segments for each speaker turn
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper (https://github.com/openai/whisper)
- Pyannote Audio (https://github.com/pyannote/pyannote-audio)
- Gradio (https://www.gradio.app/)
- PyTube (https://github.com/pytube/pytube)