A production-ready, scalable real-time audio transcription system designed for enterprise and SaaS applications. This system powers secure, private, and highly accurate speech-to-text conversion, serving as the backbone for Vexa.ai, an alternative to Otter.ai, Fireflies.ai, and Tactiq.io, enabling enterprises to build custom AI-powered conversation processing solutions. With self-hosted and on-premise deployment options, it offers data sovereignty, compliance, and privacy for mission-critical use cases.
π Enterprise-Grade Security & Compliance
Unlike typical cloud-based transcription services, this system can be deployed fully on-premise, ensuring all transcription data stays within your infrastructure. Perfect for air-gapped, HIPAA, GDPR, and other high-security environments.
β If you find this project useful, please star it on GitHub to show your support!
β Multiuser Production-Ready β Built for enterprise-scale deployment
β Data Sovereignty β Your audio and text never leave your network
β GDPR & HIPAA Compliance β Meet strict data privacy regulations
β Custom Security Policies β Integrate with your existing authentication & access controls
β Air-Gapped Deployment β Works in offline environments
β Enterprise Scalability β Designed for large workloads, supporting 1000s of concurrent users
β
Real-time transcription with advanced speaker detection
β
Multi-platform support β Google Meet Chrome Extension, future integrations for Zoom, Microsoft Teams, Slack, etc.
β
Bring Your Own Data Source β Flexible API allows integration with custom platforms
β
5-10 second latency for live captions
β
Redis-backed storage for fast retrieval & webhook-based integrations
β
Whisper v3 (optimized) for high-accuracy speech-to-text
β
GPU acceleration for ultra-fast processing
β
Self-Hosted & On-Premise β Perfect for enterprise security and data sovereignty
Keywords: AI transcription, real-time speech-to-text, open-source, self-hosted, on-premise, meeting notes, voice recognition, Otter.ai alternative, enterprise compliance, data privacy, audio processing
Ideal for enterprises and SaaS teams seeking more privacy, control, and customization than offered by commercial platforms like Otter.ai, Fireflies.ai, and Tactiq.io.
πΉ Enterprise Meeting Transcription β Automate meeting notes with speaker attribution
πΉ Customer Support & Call Centers β Real-time call transcription & agent assistance
πΉ Education & Accessibility β Create searchable lecture transcripts & captions
πΉ Content Creation β Transcribe podcasts, generate subtitles, and repurpose audio content
πΉ Medical & Healthcare β Secure, HIPAA-compliant transcription for patient records
πΉ Sales & CRM β Capture and analyze sales calls for insights and training
πΉ Internal Management Meetings β Automate documentation of leadership discussions
πΉ Legal & Compliance β Generate accurate transcripts for legal proceedings and documentation
The pipeline consists of scalable microservices designed for high-volume real-time transcription and enterprise workloads:
- Ensures delivery of audio chunks every 3 seconds
- Supports multiple concurrent sessions
- Calls Whisper Service for speech-to-text processing
- Stores transcriptions with speaker metadata in Redis
- Supports webhook integrations for real-time updates
- Runs Whisper large-v3 model for best accuracy
- Deployed with Ray Serve for scalability & efficiency
- Enables quick retrieval & session-based storage
- Serves as a message broker between services
# Clone the audio processing service
git clone https://github.com/Vexa-ai/vexa-transcription-service
# Clone the whisper service
git clone https://github.com/Vexa-ai/whisper_service
# Set up environment variables for audio service
cd audio
cp .env.example .env
# Set up environment variables for whisper service
cd ../whisper_service
cp .env.example .env
Edit .env
files to configure API tokens, GPU settings, and webhooks.
cd whisper_service
docker-compose up -d
cd ../audio
docker-compose up -d
async function sendAudioChunk(audioBlob, sessionId) {
await fetch('http://your-server:8000/api/audio/stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer your_api_token',
'Content-Type': 'application/octet-stream',
'X-Session-ID': sessionId
},
body: audioBlob
});
}
async function updateSpeaker(sessionId, speakerName) {
await fetch('http://your-server:8000/api/speaker', {
method: 'POST',
headers: {
'Authorization': 'Bearer your_api_token',
'Content-Type': 'application/json'
},
body: JSON.stringify({ session_id: sessionId, speaker_name: speakerName })
});
}
from fastapi import FastAPI, Body
from pydantic import BaseModel
from typing import List, Dict
app = FastAPI()
class TranscriptionSegment(BaseModel):
content: str
start_timestamp: str
end_timestamp: str
speaker: str = None
words: List[Dict] = []
@app.post("/api/transcriptions/{session_id}")
async def receive_transcription(session_id: str, segments: List[TranscriptionSegment] = Body(...)):
print("Transcription received:", segments)
return {"status": "success"}
We welcome contributions from enterprises and developers! If your company is migrating from Otter.ai, Fireflies, or Tactiq, weβd love to hear your use case.
β
Optimize latency & scaling for enterprise workloads
β
Enhance security & compliance features
β
Integrate with CRM, customer support tools, and knowledge bases
β
Integrate with Zoom and Microsoft Teams for seamless transcription
β
Improve language support & accuracy (multi-lingual models)
π Join the open-source effort today! Submit a Pull Request, open a GitHub Issue, or contact us.
This project is a core component of Vexa β an AI-powered meeting intelligence platform that extends transcription into business knowledge extraction.
πΉ Try Vexa for real-time transcription & AI-driven insights: vexa.ai
πΉ Follow us: @vexa.ai
πΉ Join our developer community: Vexa Discord
If you find this project helpful, please give us a β to support our community-driven development!