Skip to content

Feature Request: Intel NPU acceleration via OpenVINO #103

@Coffeebeans6932

Description

@Coffeebeans6932

What

Enable Intel NPU hardware acceleration for local Whisper inference by building whisper.cpp with -DGGML_OPENVINO=ON.

Why

Modern Intel CPUs (Core Ultra series) have dedicated NPUs with up to 48 TOPS sitting idle. Offloading the Whisper encoder to the NPU would mean faster inference, lower CPU usage, and better battery life — all important for a local-first dictation app.

How

whisper.cpp already supports OpenVINO as a backend ([docs](https://github.com/ggml-org/whisper.cpp#openvino)). The main work would be:

  1. Build the whisper.cpp addon with GGML_OPENVINO=ON for the Windows x64 target
  2. Auto-detect NPU availability at runtime and fall back to CPU if unavailable
  3. Generate/cache the OpenVINO encoder model on first launch (or bundle pre-converted models)

Environment

  • Intel Core Ultra (Lunar Lake) with NPU, 32 GB RAM, Windows 11
  • Amical v1.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions