Skip to content

Latest commit

 

History

History
163 lines (118 loc) · 5.89 KB

WEBSOCKET_README.md

File metadata and controls

163 lines (118 loc) · 5.89 KB

Real-Time Transcription WebSockets in SIPREC Server

This document provides an overview of the WebSocket-based real-time transcription feature in the SIPREC server.

Overview

The SIPREC server now supports real-time streaming of transcriptions via WebSockets. This allows clients to receive transcription updates in real-time as they are generated by the speech-to-text (STT) providers.

Key Features

  • Real-time streaming: Both interim and final transcriptions are streamed as they become available
  • Call-specific subscriptions: Clients can subscribe to transcriptions for specific calls by UUID
  • Metadata enrichment: Transcriptions include metadata like confidence scores, provider info, and word counts
  • Simple client interface: An HTML/JavaScript client is provided for easy testing and integration
  • Publish-subscribe architecture: Modular design with a transcription service and WebSocket hub

Architecture

The real-time transcription system consists of the following components:

  1. TranscriptionService: Central service that manages transcription events and notifies listeners
  2. TranscriptionListener: Interface for components that want to receive transcription updates
  3. WebSocketTranscriptionBridge: Bridge between the transcription service and WebSocket hub
  4. TranscriptionHub: Manages WebSocket connections and broadcasts messages to clients
  5. WebSocketHandler: HTTP handler for WebSocket connections

The flow is as follows:

  1. STT providers generate transcriptions as they process audio
  2. Providers publish transcriptions to the TranscriptionService
  3. The TranscriptionService notifies all registered listeners, including the WebSocketTranscriptionBridge
  4. The WebSocketTranscriptionBridge forwards messages to the TranscriptionHub
  5. The TranscriptionHub broadcasts messages to connected WebSocket clients

API

WebSocket Endpoint

ws://<server-host>:<server-port>/ws/transcriptions?call_uuid=<optional-call-uuid>
  • call_uuid: Optional parameter to subscribe to transcriptions for a specific call only

Message Format

{
  "call_uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "transcription": "This is the transcription text",
  "is_final": true,
  "timestamp": "2023-06-01T12:34:56.789Z",
  "metadata": {
    "provider": "google",
    "confidence": 0.95,
    "word_count": 5
  }
}
  • call_uuid: Unique identifier for the call
  • transcription: The transcribed text
  • is_final: Whether this is a final (true) or interim (false) transcription
  • timestamp: ISO timestamp when the transcription was generated
  • metadata: Additional information about the transcription
    • provider: The name of the STT provider that generated the transcription
    • confidence: Confidence score (0-1) for the transcription (final transcriptions only)
    • word_count: Number of words in the transcription (final transcriptions only)

WebSocket Client

A simple HTML client is provided for testing the WebSocket functionality:

http://<server-host>:<server-port>/websocket-client

This client allows you to:

  • Connect to the WebSocket endpoint
  • Subscribe to transcriptions for a specific call or all calls
  • See both interim and final transcriptions in real-time
  • View metadata for each transcription

Testing with Mock Provider

For testing purposes, a mock STT provider is included that generates random transcriptions. To use it:

  1. Start the SIPREC server
  2. Run the WebSocket test script: go run test_websocket.go
  3. Open the WebSocket client in your browser
  4. Use one of the call UUIDs displayed by the test script to subscribe to a specific call

Integration

Integrating with Custom STT Providers

To make a custom STT provider work with the real-time transcription system:

  1. Add a transcription service field to your provider struct:
type MyProvider struct {
    // ... existing fields
    transcriptionSvc *TranscriptionService
}
  1. Implement a method to set the transcription service:
func (p *MyProvider) SetTranscriptionService(svc *TranscriptionService) {
    p.transcriptionSvc = svc
}
  1. Publish transcriptions as they become available:
// For interim results
p.transcriptionSvc.PublishTranscription(callUUID, interim, false, metadata)

// For final results
p.transcriptionSvc.PublishTranscription(callUUID, transcription, true, metadata)

Integrating with Custom Clients

To create a custom client that consumes the WebSocket stream:

  1. Establish a WebSocket connection to the endpoint
  2. Handle incoming JSON messages
  3. Parse and process the transcription data as needed

Example JavaScript code:

const socket = new WebSocket('ws://localhost:9090/ws/transcriptions');

socket.addEventListener('message', function(event) {
    const data = JSON.parse(event.data);
    console.log('Transcription:', data.transcription);
    console.log('Final:', data.is_final);
    console.log('Metadata:', data.metadata);
});

Performance Considerations

  • The WebSocket hub uses non-blocking channels to avoid blocking the main application
  • Separate goroutines are used for writing to each client to prevent slow clients from affecting others
  • The WebSocket hub implementation is thread-safe with proper mutex usage
  • Regular ping messages maintain connection health
  • Error handling with proper cleanup ensures resources are released when connections close

Future Enhancements

Potential future enhancements to the real-time transcription system:

  1. Authentication: Add token-based authentication for WebSocket connections
  2. Compression: Support WebSocket compression for reduced bandwidth
  3. Metrics: Add instrumentation for monitoring WebSocket connections and message throughput
  4. Filtering: Allow clients to filter transcriptions by additional criteria (e.g., confidence level)
  5. Batching: Optimize performance with client-side message batching for high-volume scenarios