Real-Time Transcription WebSockets in SIPREC Server

This document provides an overview of the WebSocket-based real-time transcription feature in the SIPREC server.

Overview

The SIPREC server now supports real-time streaming of transcriptions via WebSockets. This allows clients to receive transcription updates in real-time as they are generated by the speech-to-text (STT) providers.

Key Features

Real-time streaming: Both interim and final transcriptions are streamed as they become available
Call-specific subscriptions: Clients can subscribe to transcriptions for specific calls by UUID
Metadata enrichment: Transcriptions include metadata like confidence scores, provider info, and word counts
Simple client interface: An HTML/JavaScript client is provided for easy testing and integration
Publish-subscribe architecture: Modular design with a transcription service and WebSocket hub

Architecture

The real-time transcription system consists of the following components:

TranscriptionService: Central service that manages transcription events and notifies listeners
TranscriptionListener: Interface for components that want to receive transcription updates
WebSocketTranscriptionBridge: Bridge between the transcription service and WebSocket hub
TranscriptionHub: Manages WebSocket connections and broadcasts messages to clients
WebSocketHandler: HTTP handler for WebSocket connections

The flow is as follows:

STT providers generate transcriptions as they process audio
Providers publish transcriptions to the TranscriptionService
The TranscriptionService notifies all registered listeners, including the WebSocketTranscriptionBridge
The WebSocketTranscriptionBridge forwards messages to the TranscriptionHub
The TranscriptionHub broadcasts messages to connected WebSocket clients

API

WebSocket Endpoint

ws://<server-host>:<server-port>/ws/transcriptions?call_uuid=<optional-call-uuid>

call_uuid: Optional parameter to subscribe to transcriptions for a specific call only

Message Format

{
  "call_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "transcription": "This is the transcription text",
  "is_final": true,
  "timestamp": "2023-06-01T12:34:56.789Z",
  "metadata": {
    "provider": "google",
    "confidence": 0.95,
    "word_count": 5
  }
}

call_uuid: Unique identifier for the call
transcription: The transcribed text
is_final: Whether this is a final (true) or interim (false) transcription
timestamp: ISO timestamp when the transcription was generated
metadata: Additional information about the transcription
- provider: The name of the STT provider that generated the transcription
- confidence: Confidence score (0-1) for the transcription (final transcriptions only)
- word_count: Number of words in the transcription (final transcriptions only)

WebSocket Client

A simple HTML client is provided for testing the WebSocket functionality:

http://<server-host>:<server-port>/websocket-client

This client allows you to:

Connect to the WebSocket endpoint
Subscribe to transcriptions for a specific call or all calls
See both interim and final transcriptions in real-time
View metadata for each transcription

Testing with Mock Provider

For testing purposes, a mock STT provider is included that generates random transcriptions. To use it:

Start the SIPREC server
Run the WebSocket test script: go run test_websocket.go
Open the WebSocket client in your browser
Use one of the call UUIDs displayed by the test script to subscribe to a specific call

Integration

Integrating with Custom STT Providers

To make a custom STT provider work with the real-time transcription system:

Add a transcription service field to your provider struct:

type MyProvider struct {
    // ... existing fields
    transcriptionSvc *TranscriptionService
}

Implement a method to set the transcription service:

func (p *MyProvider) SetTranscriptionService(svc *TranscriptionService) {
    p.transcriptionSvc = svc
}

Publish transcriptions as they become available:

// For interim results
p.transcriptionSvc.PublishTranscription(callUUID, interim, false, metadata)

// For final results
p.transcriptionSvc.PublishTranscription(callUUID, transcription, true, metadata)

Integrating with Custom Clients

To create a custom client that consumes the WebSocket stream:

Establish a WebSocket connection to the endpoint
Handle incoming JSON messages
Parse and process the transcription data as needed

Example JavaScript code:

const socket = new WebSocket('ws://localhost:9090/ws/transcriptions');

socket.addEventListener('message', function(event) {
    const data = JSON.parse(event.data);
    console.log('Transcription:', data.transcription);
    console.log('Final:', data.is_final);
    console.log('Metadata:', data.metadata);
});

Performance Considerations

The WebSocket hub uses non-blocking channels to avoid blocking the main application
Separate goroutines are used for writing to each client to prevent slow clients from affecting others
The WebSocket hub implementation is thread-safe with proper mutex usage
Regular ping messages maintain connection health
Error handling with proper cleanup ensures resources are released when connections close

Future Enhancements

Potential future enhancements to the real-time transcription system:

Authentication: Add token-based authentication for WebSocket connections
Compression: Support WebSocket compression for reduced bandwidth
Metrics: Add instrumentation for monitoring WebSocket connections and message throughput
Filtering: Allow clients to filter transcriptions by additional criteria (e.g., confidence level)
Batching: Optimize performance with client-side message batching for high-volume scenarios

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WEBSOCKET_README.md

WEBSOCKET_README.md

Real-Time Transcription WebSockets in SIPREC Server

Overview

Key Features

Architecture

API

WebSocket Endpoint

Message Format

WebSocket Client

Testing with Mock Provider

Integration

Integrating with Custom STT Providers

Integrating with Custom Clients

Performance Considerations

Future Enhancements

Files

WEBSOCKET_README.md

Latest commit

History

WEBSOCKET_README.md

File metadata and controls

Real-Time Transcription WebSockets in SIPREC Server

Overview

Key Features

Architecture

API

WebSocket Endpoint

Message Format

WebSocket Client

Testing with Mock Provider

Integration

Integrating with Custom STT Providers

Integrating with Custom Clients

Performance Considerations

Future Enhancements