Skip to content

tiiuae/aiccu-falcon-asr-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Falcon Arabic ASR API

Flask API for Falcon ASR transcription workflows. The primary deployment is now split into two containers: a lightweight API service and a separate vLLM inference service.

Included Scope

  • POST /api/transcription/sessions
  • POST /api/transcription/sessions/<session_id>/chunks
  • GET /api/transcription/sessions/<session_id>/stream
  • POST /api/transcription/sessions/<session_id>/finalize
  • WS /api/transcription/sessions/ws
  • POST /api/transcription/transcribe
  • POST /api/transcription/feedback
  • GET /api/admin/transcription/stats
  • GET /health

The API owns sessions, streaming state, feedback records, uploaded audio artifacts, and database migrations. The vLLM service owns GPU inference.

Backend Policy

  • vllm is the primary backend for deployment.
  • torch is retained as a legacy self-contained runtime path.
  • mlx is retained as an experimental local backend.

Production-facing docs and build files should prefer ASR_BACKEND=vllm unless they explicitly describe a legacy or experimental path.

Split Docker Deployment

Set the host path to the Falcon Audio v2 checkpoint, then start both services:

cp .env.example .env
# Edit FALCON_AUDIO_MODEL_HOST_PATH in .env if needed.
docker compose -f docker-compose.vllm.yml up --build

The compose stack builds:

  • Dockerfile.vllm as falcon-asr-vllm:local, serving OpenAI-compatible vLLM on port 8000.
  • Dockerfile.api as falcon-asr-api:local, serving the Flask API on port 5000 with ASR_BACKEND=vllm.

Health checks:

curl -fsS http://127.0.0.1:8000/health
curl -fsS http://127.0.0.1:5000/health

See docs/two-container-vllm-deployment.md for individual docker build and docker run commands.

Local API Development

Run the API locally against an already-running vLLM service:

python3 -m venv .venv-api
source .venv-api/bin/activate
pip install -r requirements-api.txt

export ASR_BACKEND=vllm
export VLLM_BASE_URL=http://127.0.0.1:8000
export VLLM_MODEL=/models/falcon_audio_v2_vllm

python -m flask --app wsgi.py db upgrade
python -m flask --app wsgi.py run --host 127.0.0.1 --port 5001

Single-call transcription example:

curl -sS -X POST \
  -H 'Authorization: Bearer replace-me' \
  -F 'audio_file=@tests/fixtures/walt1-2.mp3;type=audio/mpeg' \
  http://127.0.0.1:5001/api/transcription/transcribe

Legacy Self-Contained Docker Image

Dockerfile and requirements.txt remain for the older CUDA/Torch image that runs the API and inference in one container. Use this only when intentionally validating or operating the legacy path:

docker build -f Dockerfile -t falcon-arab-asr-api:legacy .
docker run --rm --gpus all \
  -p 5000:5000 \
  -v falcon_asr_api_usage:/falcon_asr_api_usage \
  -e TRANSCRIPTION_API_KEYS=replace-me \
  -e TRANSCRIPTION_ADMIN_API_KEYS=replace-me-admin \
  falcon-arab-asr-api:legacy

Cloud Build

cloudbuild.yaml builds and publishes the split images:

gcloud builds submit --config cloudbuild.yaml

Default image names:

  • falcon-asr-api
  • falcon-asr-vllm

Override region, repo, or names with substitutions:

gcloud builds submit \
  --config cloudbuild.yaml \
  --substitutions=_AR_REGION=us-central1,_AR_REPO=falcon-asr,_API_IMAGE_NAME=falcon-asr-api,_VLLM_IMAGE_NAME=falcon-asr-vllm

Legacy unified-image release history is kept in docs/Docker-Release-History.md.

API Documentation

  • Customer-facing API guide: docs/Falcon-ASR-demo-API-documentation.md
  • Split deployment guide: docs/two-container-vllm-deployment.md
  • vLLM shim notes: docs/falcon_audio_v2_vllm_deployment.md
  • Experimental MLX notes: mlx_porting.md

Authentication for deployed environments uses Authorization: Bearer <key>. X-API-Key remains accepted only as legacy compatibility.

Validation Behavior

  • JSON content type enforced for session creation and feedback.
  • Multipart content type enforced for chunk uploads and one-shot transcription.
  • UUID validation for session_id and transcription_id.
  • Audio extension and MIME checks for uploads.
  • Consistent 4xx response envelope: error, code, status_code.

Tests

Install test dependencies in a local environment, then run:

python -m pytest -q tests/test_transcription_api.py tests/test_transcription_models.py tests/test_migrations.py tests/test_vllm_adapter.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages