Falcon Arabic ASR API

Flask API for Falcon ASR transcription workflows. The primary deployment is now split into two containers: a lightweight API service and a separate vLLM inference service.

Included Scope

POST /api/transcription/sessions
POST /api/transcription/sessions/<session_id>/chunks
GET /api/transcription/sessions/<session_id>/stream
POST /api/transcription/sessions/<session_id>/finalize
WS /api/transcription/sessions/ws
POST /api/transcription/transcribe
POST /api/transcription/feedback
GET /api/admin/transcription/stats
GET /health

The API owns sessions, streaming state, feedback records, uploaded audio artifacts, and database migrations. The vLLM service owns GPU inference.

Backend Policy

vllm is the primary backend for deployment.
torch is retained as a legacy self-contained runtime path.
mlx is retained as an experimental local backend.

Production-facing docs and build files should prefer ASR_BACKEND=vllm unless they explicitly describe a legacy or experimental path.

Split Docker Deployment

Set the host path to the Falcon Audio v2 checkpoint, then start both services:

cp .env.example .env
# Edit FALCON_AUDIO_MODEL_HOST_PATH in .env if needed.
docker compose -f docker-compose.vllm.yml up --build

The compose stack builds:

Dockerfile.vllm as falcon-asr-vllm:local, serving OpenAI-compatible vLLM on port 8000.
Dockerfile.api as falcon-asr-api:local, serving the Flask API on port 5000 with ASR_BACKEND=vllm.

Health checks:

curl -fsS http://127.0.0.1:8000/health
curl -fsS http://127.0.0.1:5000/health

See docs/two-container-vllm-deployment.md for individual docker build and docker run commands.

Local API Development

Run the API locally against an already-running vLLM service:

python3 -m venv .venv-api
source .venv-api/bin/activate
pip install -r requirements-api.txt

export ASR_BACKEND=vllm
export VLLM_BASE_URL=http://127.0.0.1:8000
export VLLM_MODEL=/models/falcon_audio_v2_vllm

python -m flask --app wsgi.py db upgrade
python -m flask --app wsgi.py run --host 127.0.0.1 --port 5001

Single-call transcription example:

curl -sS -X POST \
  -H 'Authorization: Bearer replace-me' \
  -F 'audio_file=@tests/fixtures/walt1-2.mp3;type=audio/mpeg' \
  http://127.0.0.1:5001/api/transcription/transcribe

Legacy Self-Contained Docker Image

Dockerfile and requirements.txt remain for the older CUDA/Torch image that runs the API and inference in one container. Use this only when intentionally validating or operating the legacy path:

docker build -f Dockerfile -t falcon-arab-asr-api:legacy .
docker run --rm --gpus all \
  -p 5000:5000 \
  -v falcon_asr_api_usage:/falcon_asr_api_usage \
  -e TRANSCRIPTION_API_KEYS=replace-me \
  -e TRANSCRIPTION_ADMIN_API_KEYS=replace-me-admin \
  falcon-arab-asr-api:legacy

Cloud Build

cloudbuild.yaml builds and publishes the split images:

gcloud builds submit --config cloudbuild.yaml

Default image names:

falcon-asr-api
falcon-asr-vllm

Override region, repo, or names with substitutions:

gcloud builds submit \
  --config cloudbuild.yaml \
  --substitutions=_AR_REGION=us-central1,_AR_REPO=falcon-asr,_API_IMAGE_NAME=falcon-asr-api,_VLLM_IMAGE_NAME=falcon-asr-vllm

Legacy unified-image release history is kept in docs/Docker-Release-History.md.

API Documentation

Customer-facing API guide: docs/Falcon-ASR-demo-API-documentation.md
Split deployment guide: docs/two-container-vllm-deployment.md
vLLM shim notes: docs/falcon_audio_v2_vllm_deployment.md
Experimental MLX notes: mlx_porting.md

Authentication for deployed environments uses Authorization: Bearer <key>. X-API-Key remains accepted only as legacy compatibility.

Validation Behavior

JSON content type enforced for session creation and feedback.
Multipart content type enforced for chunk uploads and one-shot transcription.
UUID validation for session_id and transcription_id.
Audio extension and MIME checks for uploads.
Consistent 4xx response envelope: error, code, status_code.

Tests

Install test dependencies in a local environment, then run:

python -m pytest -q tests/test_transcription_api.py tests/test_transcription_models.py tests/test_migrations.py tests/test_vllm_adapter.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
asr_model		asr_model
config		config
docs		docs
migrations		migrations
mlx_custom		mlx_custom
mlx_models		mlx_models
scripts		scripts
tests		tests
vllm_falcon_audio		vllm_falcon_audio
vllm_falcon_audio_bootstrap		vllm_falcon_audio_bootstrap
vllm_models/falcon_audio_v2_vllm		vllm_models/falcon_audio_v2_vllm
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
Dockerfile.vllm		Dockerfile.vllm
README.md		README.md
cloudbuild-api-only.yaml		cloudbuild-api-only.yaml
cloudbuild-vllm-only.yaml		cloudbuild-vllm-only.yaml
cloudbuild.yaml		cloudbuild.yaml
docker-compose.vllm.yml		docker-compose.vllm.yml
mlx_porting.md		mlx_porting.md
pytest.ini		pytest.ini
requirements-api.txt		requirements-api.txt
requirements.txt		requirements.txt
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Falcon Arabic ASR API

Included Scope

Backend Policy

Split Docker Deployment

Local API Development

Legacy Self-Contained Docker Image

Cloud Build

API Documentation

Validation Behavior

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Falcon Arabic ASR API

Included Scope

Backend Policy

Split Docker Deployment

Local API Development

Legacy Self-Contained Docker Image

Cloud Build

API Documentation

Validation Behavior

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages