Client-Side Audio Transcription

A browser-based AI transcription playground powered by Whisper and Transformers.js. No installation, registration, or payment required.

🚀 Overview

This project is a client-side transcription web app built with React, TypeScript, and Vite. It runs Whisper directly in the browser through @huggingface/transformers, so media files are processed locally instead of being uploaded to a backend for transcription.

The current implementation supports selecting a Whisper model in the UI, choosing a local media file, loading the selected model on demand, and displaying the recognized text in a read-only transcript area.

✨ Features

Client-side speech-to-text
The React app calls the automatic-speech-recognition pipeline from @huggingface/transformers directly in the browser, so transcription runs entirely on the client.
Simple 3-step workflow
The UI guides you through:
1. Loading the Whisper model.
2. Checking model status.
3. Uploading audio and running transcription, with clear status messages for each step.
In-browser transcription with @huggingface/transformers
Multilingual Whisper model selection in the UI
Supported built-in model options:
- Xenova/whisper-tiny
- Xenova/whisper-base
- Xenova/whisper-small
Client-side audio decoding to 16 kHz via AudioContext
Stereo-to-mono mixing before inference
Chunked transcription settings for longer media:
- chunk_length_s: 20
- stride_length_s: 5
File input accepts:
- audio/*
- video/mp4
- video/webm
- video/ogg
- .mp4
- .webm
- .ogv
- .m4v

🧱 Tech stack

Frontend: React + TypeScript + Vite
ML runtime: @huggingface/transformers
Inference task: automatic-speech-recognition
Browser audio handling: Web Audio API (AudioContext)
Testing: Jest + Testing Library
Container tooling: Docker + Docker Compose

How it works

1. App layout

App.tsx renders the app shell, title, subtitle, SettingsBar, and HomeScreen.

The settings bar currently displays the runtime summary:

Transformers.js + Whisper

2. Model and file selection

HomeScreen.tsx provides a 3-step UI:

Choose a model and media file
Check model status
Read the transcription result

The screen includes:

A Whisper model dropdown
A hidden file input triggered by a button
Status text and spinner while processing
A transcript textarea
A Clear button

3. Transcription hook

useTranscription.ts is the core implementation.

It exposes:

status
error
transcript
availableModels
selectedModelId
setSelectedModelId(modelId)
transcribeFile(file)
reset()

Behavior:

The selected Whisper model is loaded lazily on first use
The pipeline instance is cached and reused if the same model remains selected
Browser-friendly ONNX WASM settings are applied before model loading
The selected file is read as an ArrayBuffer
Audio is decoded with AudioContext({ sampleRate: 16000 })
Multi-channel audio is mixed down to mono
Whisper runs with automatic language detection because language is intentionally left unset
The recognized text is written to the transcript state

4. Status messages

The current UI reports user-facing states such as:

idle: choose a model and a file
loading: first model load may be slow
ready: model loaded and ready
transcribing: local browser transcription is running
done: transcription finished
error: failure message shown below the status block

Supported media notes

The UI text says users can select audio or video files and that Whisper can detect speech from supported media such as MP3 or MP4 in the browser.

However, the actual implementation decodes the selected file using AudioContext.decodeAudioData(). In practice, successful decoding depends on browser codec support. That means supported behavior is ultimately constrained by what the user’s browser can decode from the selected media file.

🚀 Getting Started

Local development

Prerequisites

Node.js 20+ recommended
npm

Run locally with npm

cd frontend/app
npm ci
npm run dev -- --host 0.0.0.0 --port 5173

Run locally with Docker Compose

docker compose build
docker compose up

This starts the frontend container and serves the Vite app on port 5173.

Testing

Run tests locally

cd frontend/app
npm ci
npm test -- --ci --runInBand --coverage --verbose

docker compose development

Prerequisites

Docker Compose

Build and start all services:

# Build the image
docker compose build

# Run the container
docker compose up

Test:

docker compose \
-f docker-compose.test.yml up \
--build --exit-code-from \
frontend_test

Notes and limitations

Model loading happens in the browser and may take time on first use
Larger models use more memory
Transcription speed depends on the browser and device
Media decoding support depends on browser codec support
The current app has no backend transcription service; transcription is performed client-side

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
assets/images		assets/images
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.de.md		README.de.md
README.es.md		README.es.md
README.fr.md		README.fr.md
README.hi.md		README.hi.md
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
README.pt-BR.md		README.pt-BR.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Client-Side Audio Transcription

🚀 Overview

✨ Features

🧱 Tech stack

How it works

1. App layout

2. Model and file selection

3. Transcription hook

4. Status messages

Supported media notes

🚀 Getting Started

Local development

Prerequisites

Run locally with npm

Run locally with Docker Compose

Testing

Run tests locally

docker compose development

Prerequisites

Build and start all services:

Test:

Notes and limitations

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Client-Side Audio Transcription

🚀 Overview

✨ Features

🧱 Tech stack

How it works

1. App layout

2. Model and file selection

3. Transcription hook

4. Status messages

Supported media notes

🚀 Getting Started

Local development

Prerequisites

Run locally with npm

Run locally with Docker Compose

Testing

Run tests locally

docker compose development

Prerequisites

Build and start all services:

Test:

Notes and limitations

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages