Hear the world. Transcribe it instantly.

Effortlessly transform live audio into text with our fine-tuned Whisper and Wav2Vec2 models. Built for real-time speech recognition, this project brings low-latency transcription to life—perfect for conversations, streaming, or any dynamic audio environment.

Powerful. Simple. Yours.

Capture live 16kHz mono audio with ease.
Fine-tune models in real time for unmatched accuracy.
Switch seamlessly between Whisper and Wav2Vec2.
Save and load models effortlessly with timestamped precision.
English transcription, optimized for clarity.

Get Started

Clone the repository and set up in minutes:

git clone https://github.com/bniladridas/speech-model.git
cd speech-model
pip install -r requirements.txt

For Linux, install audio dependencies:

sudo apt-get install libsndfile1

Bring Your Voice to Life

Fine-Tune Models
Train on live audio with a single command:

python main.py --model whisper
# or
python main.py --model wav2vec2

Press Ctrl+C to save your fine-tuned model automatically.

Transcribe Audio
Test your model with real-time transcription:

python test_transcription.py --model whisper
# or
python test_transcription.py --model wav2vec2

Records 5 seconds of audio (customizable) and delivers instant text.

Save Your Work
Models are saved to:
models/speech_recognition_ai_fine_tune_[model]_[timestamp]
Customize the path:

export MODEL_SAVE_PATH="/your/path"
python main.py --model [whisper|wav2vec2]

What You Need

Python 3.8+
PyTorch (2.0.1 recommended)
Transformers (4.35.0 recommended)
Sounddevice (0.4.6)
Torchaudio (2.0.1)
A GPU accelerates fine-tuning. Full requirements in requirements.txt.

Behind the Magic

Task: Automatic Speech Recognition (ASR)
Models:
- Whisper (openai/whisper-small)
- Wav2Vec2 (facebook/wav2vec2-base-960h)
Fine-Tuning: Optimized on live 16kHz mono audio with Adam optimizer (learning rate 1e-5).
Input: 16kHz mono audio
Output: Precise English transcription

Load models from Hugging Face:

from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("bniladridas/speech-recognition-ai-fine-tune")
processor = WhisperProcessor.from_pretrained("bniladridas/speech-recognition-ai-fine-tune")

Project Structure

dataset.py: Audio recording and preprocessing
train.py: Training pipeline
test_transcription.py: Real-time transcription
main.py: Core fine-tuning script
requirements.txt: Dependencies
README.md: This guide

Troubleshooting

Git Push Issues
For non-fast-forward errors:

git pull origin main --rebase
git push origin main

Stuck Processes
Find and stop processes:

ps aux | grep python
kill -9 <PID>

Large Files
Exclude models from Git:

echo "models/" >> .gitignore
git add .gitignore
git rm -r --cached models/
git commit -m "Update .gitignore"

Your Data, Your Models

No pre-existing dataset needed. Your live audio recordings fuel the fine-tuning process, making every model uniquely yours.

Join the Journey

Licensed under the MIT License. Contribute, refine, and share on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
dataset.py		dataset.py
evaluate.py		evaluate.py
inference.py		inference.py
main.py		main.py
model.py		model.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
test_transcription.py		test_transcription.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hear the world. Transcribe it instantly.

Powerful. Simple. Yours.

Get Started

Bring Your Voice to Life

What You Need

Behind the Magic

Project Structure

Troubleshooting

Your Data, Your Models

Join the Journey

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

bniladridas/speech-model

Folders and files

Latest commit

History

Repository files navigation

Hear the world. Transcribe it instantly.

Powerful. Simple. Yours.

Get Started

Bring Your Voice to Life

What You Need

Behind the Magic

Project Structure

Troubleshooting

Your Data, Your Models

Join the Journey

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages