Skip to content

Latest commit

 

History

History
158 lines (122 loc) · 4.88 KB

File metadata and controls

158 lines (122 loc) · 4.88 KB

Ryzen™ AI Automatic Speech Recognition

Automatic Speech Recognition using OpenAI Whisper

Unlock fast, on-device speech recognition with RyzenAI and OpenAI’s Whisper. This demo walks you through preparing and running OpenAI's Whisper (base, small, medium) for fast, local ASR on AMD NPU.

Features

  • 🚀 Download NPU Optimized Whisper ONNX models from HF
  • ⚡ Run ASR locally on CPU or NPU
  • 📊 Evaluate ASR on LibriSpeech samples and report WER/CER
  • 🎧 Supports transcription of audio files and microphone input
  • ⏱️ Reports Performance using RTF and TTFT

🔗 Quick Links

📦 Prerequisites

  1. Install Ryzen AI SDK Follow RyzenAI documentation to install SDK and drivers.

  2. Activate environment

    conda activate ryzen-ai-<version>
  3. Clone repository

    git clone https://github.com/amd/RyzenAI-SW.git
    cd RyzenAI-SW/demo/ASR/Whisper
  4. Install dependencies

    pip install -r requirements.txt

⚡Accelerate Whisper on AMD NPU

Why run on NPU?

  • Offloads compute from CPU onto NPU, freeing up CPU for other tasks.
  • Delivers higher throughput and lower power consumption when running AI workloads
  • Optimized execution of Whisper’s encoder and decoder models.
  • Runs models with BFP16 precision for near-FP32 accuracy and INT8-like performance.

NPU Run for Whisper-Base

When running inference on the NPU, 100% of the encoder operators and 93.4% of the decoder operators are executed on the NPU.

   #encoder operations
   [Vitis AI EP] No. of Operators : VAIML   225
   [Vitis AI EP] No. of Subgraphs : VAIML     1

   #decoder operations
   [Vitis AI EP] No. of Operators :   CPU    24  VAIML   341
   [Vitis AI EP] No. of Subgraphs : VAIML     2

Set up VitisEP Configuration for NPU

  • Edit config/model_config.json to specify Execution Providers.

  • For NPU:

    • Set cache_key and cache_dir
    • Use corresponding vitisai_config from config/

Example:

{
  "config_file": "config/vitisai_config_whisper_decoder.json",
  "cache_dir": "./cache",
  "cache_key": "whisper_medium_decoder"
}

⚠️ Special Instructions for Whisper-Medium

When running whisper-medium on NPU, it is recommended to add the following flags to configs\vitisai_config_whisper_encoder.json incase of compilation issues.

"vaiml_config": {
  "optimize_level": 3,
  "aiecompiler_args": "--system-stack-size=512"
}

These settings:

  • optimize_level=3: Enables aggressive optimizations for larger models.
  • --system-stack-size=512: Increases the AI Engine system stack size to handle Whisper-Medium’s higher resource demand.

🚀 Usage

Transcribe Audio File

Use this to transcribe a pre-recorded .wav file into text using the Whisper mode

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --input path/to/audio.wav
  • Replace with whisper-base, whisper-small, or whisper-medium.

  • Replace path/to/audio.wav with your audio file.

For example, run whisper-large-v3-turbo

python run_whisper.py --model-type whisper-large-v3-turbo --device npu  --input audio_files\1089-134686-0000.wav

Transcribe from Microphone

Run real-time speech-to-text by capturing audio from your microphone. This allows you to speak and see live transcription:

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --input mic \
  --duration 0
  • --duration 0 means continuous recording until stopped (Ctrl+C) or detects silence for a set duration

  • Ideal for demos and testing live ASR performance.

Evaluate on Dataset

Run batch evaluation on a dataset (e.g., LibriSpeech samples) to measure model performance with metrics like WER, CER, and RTF:

python run_whisper.py \
  --model-type <whisper-type> \
  --device npu \
  --eval-dir eval_dataset/LibriSpeech-samples \
  --results-dir results
  • --eval-dir specifies the dataset directory.

  • --results-dir is where evaluation reports (WER, CER, TTFT, RTF) will be saved.

  • Useful for benchmarking and validating models.

Notes

  • First run on NPU may take ~15 min for model compilation.
  • Ensure paths for encoder, decoder, and config files are correct.
  • Supports CPU and NPU devices.