🎧 StreamTranscriber

A real-time audio transcription proof-of-concept using Faster-Whisper and WebSockets

🚀 Overview

Conventional transcription systems send long recordings (sometimes 90 s or more) for processing and make users wait tens of seconds for results.
This PoC prototype demonstrates how to stream audio continuously to a backend and receive incremental transcriptions in near real-time.

It was created as a small project for a German AI startup exploring the move from slow batch uploads to a fully streamed pipeline.

🧠 What It Does

Streams raw PCM float32 audio chunks from client → FastAPI backend through WebSockets
Backend transcribes chunks on the fly with Faster-Whisper
Sends partial transcription text back to the client after each chunk
Gives the user the feeling of live captioning instead of waiting for large batch results

🏗️ Tech Stack

Layer	Technology
Backend Framework	FastAPI + WebSockets
Speech-to-Text Model	Faster-Whisper (`base`)
Language	Python 3.10+
Client	Python WebSockets sender (replaceable with microphone input)
Audio Format	Mono 16 kHz `float32` PCM

⚙️ Setup Instructions

1. Clone the Repository

git clone https://github.com/rohan9024/StreamTranscriber.git
cd StreamTranscriber

2. Create a Virtual Environment

python -m venv venv
# Activate it
# Windows:
venv\Scripts\activate
# macOS / Linux:
source venv/bin/activate

3. Install Requirements

pip install -r requirements.txt

4. Run the Server

uvicorn server:app --host 127.0.0.1 --port 8001 --reload

5. Send Audio from the Client

python client.py

Note:

Your sample_16k.wav file should be mono and 16 kHz.
If your file is stereo, the client automatically converts it to mono.

📊 How It Works

Client loads a .wav file, breaks it into ~2 s chunks, and sends each chunk via WebSocket.
Server receives chunk → converts from bytes to numpy array → transcribes with Faster-Whisper.
Server immediately returns partial text to the client.
The process continues until the entire audio stream is processed.

🧩 Example Output

Console snippet while running:

📦 Sent 32000 samples
📝 Partial transcription: Hello everyone and welcome to the meeting...
📦 Sent 32000 samples
📝 Partial transcription: Today we'll discuss progress on the new release...

🚀 Results

Approach	Delay Before Text	User Experience
90 s file upload	≈ 45 s	Feels batch-processed
Streamed PCM chunks	1 – 2 s	Feels live and conversational

By streaming smaller chunks, the user starts seeing text updates almost instantly instead of waiting for the full recording to complete.

🛠️ Possible Extensions

Real-time microphone input (continuous streaming)
Better buffering / overlap for smoother context
Frontend dashboard that displays text live (React / JS)
Integration with summarization or meeting-notes generators

💡 Motivation

This PoC came from a common startup bottleneck:

"Recording 90 s of audio and waiting another 45 s for a transcript isn't practical for live use."

StreamTranscriber proves that even lightweight infrastructure can deliver real-time AI transcription using open-source tools.

📦 Requirements

Create a file requirements.txt (if you don't have one already) with:

fastapi==0.109.0
uvicorn==0.25.0
numpy==1.26.0
soundfile==0.12.1
websockets==12.0
faster-whisper==0.9.0

🤝 Acknowledgements

Faster-Whisper for efficient ASR inference
FastAPI for its excellent async & WebSocket support
Open-source community for continued innovation in AI infrastructure

🧑‍💻 Author

Rohan R. Wandre
ML Engineer

Feel free to fork, modify, or use this project as a reference for building real-time transcription pipelines.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
client.py		client.py
main.py		main.py
requirements.txt		requirements.txt
sample.wav		sample.wav
sample_16k.wav		sample_16k.wav
test_client.py		test_client.py
transcriber.py		transcriber.py
ws.py		ws.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎧 StreamTranscriber

🚀 Overview

🧠 What It Does

🏗️ Tech Stack

⚙️ Setup Instructions

1. Clone the Repository

2. Create a Virtual Environment

3. Install Requirements

4. Run the Server

5. Send Audio from the Client

📊 How It Works

🧩 Example Output

🚀 Results

🛠️ Possible Extensions

💡 Motivation

📦 Requirements

🤝 Acknowledgements

🧑‍💻 Author

About

Uh oh!

Releases

Packages

Languages

rohan9024/StreamTranscriber

Folders and files

Latest commit

History

Repository files navigation

🎧 StreamTranscriber

🚀 Overview

🧠 What It Does

🏗️ Tech Stack

⚙️ Setup Instructions

1. Clone the Repository

2. Create a Virtual Environment

3. Install Requirements

4. Run the Server

5. Send Audio from the Client

📊 How It Works

🧩 Example Output

🚀 Results

🛠️ Possible Extensions

💡 Motivation

📦 Requirements

🤝 Acknowledgements

🧑‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages