Jan Incubator Audio

Warning

This is purely experimental code. Use at your own risk.

Jan Incubator Audio

This repository is for POC of audio and realtime endpoints.

Development Setup

Make sure uv is installed

curl -LsSf https://astral.sh/uv/install.sh | sh

Create and activate a virtual environment

uv venv --python=3.12 --managed-python
source .venv/bin/activate

Setup pre-commit hooks

uv pip install pre-commit
pre-commit install  # install pre-commit hooks

pre-commit  # manually run pre-commit
pre-commit run --all-files  # if you forgot to install pre-commit previously

Start servers

# fill api keys in .env file
cp .env.example .env

# activate venv first
souurce .venv/bin/activate

# start menlo server
python run_serve.py

# in another terminal, start whisper server (for eval purposes)
scripts/serve_whisper.sh

Audio Evaluation Pipeline

Core modules live under audio_eval/:
- common_voice_dataset.py handles dataset sampling (defaults to test_100 split).
- asr_services.py wraps each ASR provider.
- text_normalizer_utils.py wires the multilingual normalizers.
- wer_evaluator.py coordinates transcription, logging, and WER scoring (HF evaluate + optional tqdm).
CLI entry point: scripts/run_wer_eval.py
Quick service checks: scripts/test_transcribe_vllm.py and scripts/test_transcribe_speechmatics.py.

Example run (progress bars + transcript dump enabled by default):

scripts/run_wer_eval.py \
  --dataset-path "$CV22_PATH" \
  --preset smoke \
  --services menlo_large-v3 vllm_openai-whisper-large-v3 speechmatics \
  --n-samples 5

To resume an interrupted run, point to the previous checkpoint (and optionally reuse the same results/log paths):

scripts/run_wer_eval.py \
  --dataset-path "$CV22_PATH" \
  --preset all \
  --checkpoint results/wer_checkpoint_all_20250923_233231.json \
  --results results/wer_results_all_20250923_233231.json \
  --log logs/wer_eval_all_20250923_233231.log

Outputs land in results/, logs/, and logs/transcripts_<timestamp>.jsonl.

Acknowledgements

This project adapts the multilingual text normalizer from the Hugging Face Open ASR Leaderboard, itself derived from the Whisper repository. We appreciate the maintainers for releasing those utilities under the Apache 2.0 license.

We also rely on community implementations of audio models such as faster-whisper, vllm and NVIDIA’s NeMo framework, which power parts of our backend.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
audio		audio
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jan Incubator Audio

Development Setup

Start servers

Audio Evaluation Pipeline

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

menloresearch/jan-incub-multimodal

Folders and files

Latest commit

History

Repository files navigation

Jan Incubator Audio

Development Setup

Start servers

Audio Evaluation Pipeline

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages