A FastAPI WebSocket service that relays live audio transcription between browser clients and Google Cloud Speech-to-Text v2.
CARE's Django backend cannot hold long-lived WebSocket connections, so this lightweight middleware sits between the frontend and Google's streaming STT gRPC API.
┌──────────┐ JWT + audio (WS) ┌──────────────────┐ gRPC streaming ┌─────────────┐
│ Client │ ─────────────────► │ care_scribe_mw │ ───────────────► │ Google STT │
│ (browser)│ ◄───────────────── │ (FastAPI) │ ◄─────────────── │ v2 API │
└──────────┘ transcripts └──────────────────┘ results └─────────────┘
cp .env.example .env
# Edit .env — set JWT_SECRET_KEY (must match care_be's DJANGO_SECRET_KEY)
# set GOOGLE_PROJECT_ID
# set GOOGLE_APPLICATION_CREDENTIALS (or use ADC)pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8090 --reloadws://localhost:8090/ws/transcribe?token=<JWT_ACCESS_TOKEN>
The JWT is verified using the same secret key as care_be (HS256 + DJANGO_SECRET_KEY).
{
"language": "en-US",
"model": "long",
"sample_rate": 16000,
"interim_results": true
}All fields are optional and fall back to server defaults.
Send raw PCM16 mono audio at the configured sample rate. Recommended chunk size: 100-200 ms (~3200-6400 bytes at 16 kHz).
{"type": "ready"}
{"type": "transcript", "text": "hello world", "is_final": false}
{"type": "transcript", "text": "hello world!", "is_final": true, "confidence": 0.95}Send {"type": "stop"} or simply close the WebSocket.
GET /health → {"status": "ok"}
| Variable | Required | Default | Description |
|---|---|---|---|
JWT_SECRET_KEY |
yes | — | Must match DJANGO_SECRET_KEY in care_be |
JWT_ALGORITHM |
no | HS256 |
JWT signing algorithm |
GOOGLE_PROJECT_ID |
yes | — | GCP project ID |
GOOGLE_LOCATION |
no | global |
GCP region for STT |
GOOGLE_APPLICATION_CREDENTIALS |
situational | — | Path to service account JSON (or use ADC) |
DEFAULT_LANGUAGE |
no | en-US |
Default recognition language |
DEFAULT_MODEL |
no | long |
Default recognition model |
DEFAULT_SAMPLE_RATE |
no | 16000 |
Default audio sample rate (Hz) |
CORS_ALLOWED_ORIGINS |
no | ["*"] |
Allowed CORS origins |