Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 2.13 KB

File metadata and controls

56 lines (45 loc) · 2.13 KB

ASR worker microservices

The orchestrator only talks to ASR through an OpenAI-compatible POST /v1/audio/transcriptions endpoint. One worker is bundled; others are documented contracts you can implement separately.

Contract

Request (multipart/form-data):

Field Required Notes
file yes Audio chunk (16 kHz mono WAV is what the orchestrator sends)
model yes Model identifier
language no BCP-47 hint, e.g. hi, en
prompt no Optional decoding bias
response_format no verbose_json is preferred; json is acceptable
timestamp_granularities[] no Repeated values; the orchestrator sends segment and word

Response (JSON):

{
  "text": "haan let's start karte hain",
  "language": "hi",
  "segments": [
    {"start": 0.0, "end": 1.5, "text": "haan let's start", "confidence": 0.91}
  ],
  "words": [
    {"start": 0.0, "end": 0.4, "word": "haan", "confidence": 0.95}
  ]
}

Timestamps in the response must be local to the audio chunk; the orchestrator adds the chunk's global start_offset.

Sanity check with curl

curl -X POST http://localhost:8001/v1/audio/transcriptions \
  -F file=@chunk.wav \
  -F model=Qwen3-ASR-1.7B \
  -F language=hi \
  -F response_format=verbose_json \
  -F 'timestamp_granularities[]=segment' \
  -F 'timestamp_granularities[]=word'

Workers