Name	Name	Last commit message	Last commit date
parent directory ..
audio_files	audio_files
config	config
eval_dataset/LibriSpeech-samples	eval_dataset/LibriSpeech-samples
README.md	README.md
requirements.txt	requirements.txt
run_whisper.py	run_whisper.py

Ryzen™ AI Automatic Speech Recognition

Automatic Speech Recognition using OpenAI Whisper

Unlock fast, on-device speech recognition with RyzenAI and OpenAI’s Whisper. This demo walks you through preparing and running OpenAI's Whisper (base, small, medium) for fast, local ASR on AMD NPU.

Features

🚀 Download NPU Optimized Whisper ONNX models from HF
⚡ Run ASR locally on CPU or NPU
📊 Evaluate ASR on LibriSpeech samples and report WER/CER
🎧 Supports transcription of audio files and microphone input
⏱️ Reports Performance using RTF and TTFT

🔗 Quick Links

Prerequisites
Accelerate Whisper on AMD NPU
- Why run on NPU?
- Set up VitisEP Configuration for NPU
Usage
Notes

📦 Prerequisites

Install Ryzen AI SDK Follow RyzenAI documentation to install SDK and drivers.
Activate environment
```
conda activate ryzen-ai-<version>
```

Clone repository

git clone https://github.com/amd/RyzenAI-SW.git
cd RyzenAI-SW/demo/ASR/Whisper

Install dependencies
```
pip install -r requirements.txt
```

⚡Accelerate Whisper on AMD NPU

Why run on NPU?

Offloads compute from CPU onto NPU, freeing up CPU for other tasks.
Delivers higher throughput and lower power consumption when running AI workloads
Optimized execution of Whisper’s encoder and decoder models.
Runs models with BFP16 precision for near-FP32 accuracy and INT8-like performance.

NPU Run for Whisper-Base

When running inference on the NPU, 100% of the encoder operators and 93.4% of the decoder operators are executed on the NPU.

   #encoder operations
   [Vitis AI EP] No. of Operators : VAIML   225
   [Vitis AI EP] No. of Subgraphs : VAIML     1

   #decoder operations
   [Vitis AI EP] No. of Operators :   CPU    24  VAIML   341
   [Vitis AI EP] No. of Subgraphs : VAIML     2

Set up VitisEP Configuration for NPU

Edit config/model_config.json to specify Execution Providers.
For NPU:
- Set cache_key and cache_dir
- Use corresponding vitisai_config from config/

Example:

{
  "config_file": "config/vitisai_config_whisper_decoder.json",
  "cache_dir": "./cache",
  "cache_key": "whisper_medium_decoder"
}

⚠️ Special Instructions for Whisper-Medium

When running whisper-medium on NPU, it is recommended to add the following flags to configs\vitisai_config_whisper_encoder.json incase of compilation issues.

"vaiml_config": {
  "optimize_level": 3,
  "aiecompiler_args": "--system-stack-size=512"
}

These settings:

optimize_level=3: Enables aggressive optimizations for larger models.
--system-stack-size=512: Increases the AI Engine system stack size to handle Whisper-Medium’s higher resource demand.

🚀 Usage

Transcribe Audio File

Use this to transcribe a pre-recorded .wav file into text using the Whisper mode

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --input path/to/audio.wav

Replace with whisper-base, whisper-small, or whisper-medium.
Replace path/to/audio.wav with your audio file.

For example, run whisper-large-v3-turbo

python run_whisper.py --model-type whisper-large-v3-turbo --device npu  --input audio_files\1089-134686-0000.wav

Transcribe from Microphone

Run real-time speech-to-text by capturing audio from your microphone. This allows you to speak and see live transcription:

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --input mic \
  --duration 0

--duration 0 means continuous recording until stopped (Ctrl+C) or detects silence for a set duration
Ideal for demos and testing live ASR performance.

Evaluate on Dataset

Run batch evaluation on a dataset (e.g., LibriSpeech samples) to measure model performance with metrics like WER, CER, and RTF:

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --eval-dir eval_dataset/LibriSpeech-samples \
  --results-dir results

--eval-dir specifies the dataset directory.
--results-dir is where evaluation reports (WER, CER, TTFT, RTF) will be saved.
Useful for benchmarking and validating models.

Notes

First run on NPU may take ~15 min for model compilation.
Ensure paths for encoder, decoder, and config files are correct.
Supports CPU and NPU devices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Ryzen™ AI Automatic Speech Recognition

Automatic Speech Recognition using OpenAI Whisper

Features

🔗 Quick Links

📦 Prerequisites

⚡Accelerate Whisper on AMD NPU

Why run on NPU?

NPU Run for Whisper-Base

Set up VitisEP Configuration for NPU

⚠️ Special Instructions for Whisper-Medium

🚀 Usage

Transcribe Audio File

Transcribe from Microphone

Evaluate on Dataset

Notes

FilesExpand file tree

Whisper

Directory actions

More options

Directory actions

More options

Latest commit

History

Whisper

Folders and files

parent directory

README.md

Ryzen™ AI Automatic Speech Recognition

Automatic Speech Recognition using OpenAI Whisper

Features

🔗 Quick Links

📦 Prerequisites

⚡Accelerate Whisper on AMD NPU

Why run on NPU?

NPU Run for Whisper-Base

Set up VitisEP Configuration for NPU

⚠️ Special Instructions for Whisper-Medium

🚀 Usage

Transcribe Audio File

Transcribe from Microphone

Evaluate on Dataset

Notes