Skip to content

[JS] Implement WhisperPipeline#3399

Merged
as-suvorov merged 14 commits intoopenvinotoolkit:masterfrom
Retribution98:whisper_js
Mar 9, 2026
Merged

[JS] Implement WhisperPipeline#3399
as-suvorov merged 14 commits intoopenvinotoolkit:masterfrom
Retribution98:whisper_js

Conversation

@Retribution98
Copy link
Contributor

Description

  • Implemented WhisperPipeline
  • Added WhisperPipeline samples

CVS-179940

Checklist:

  • This PR follows GenAI Contributing guidelines.
  • Tests have been updated or added to cover the new code.
  • This PR fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Signed-off-by: Kirill Suvorov <kirill.suvorov@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements WhisperPipeline for JavaScript/Node.js bindings, adding speech recognition capabilities to the openvino-genai-node package. The implementation follows the existing patterns established by VLMPipeline and other pipeline implementations, providing comprehensive support for Whisper models including word-level timestamps, language detection, translation, and streaming.

Changes:

  • Added WhisperPipeline implementation with C++ N-API bindings and TypeScript wrapper
  • Created JavaScript samples demonstrating Whisper speech recognition with timestamps
  • Added comprehensive test coverage for WhisperPipeline API and configuration
  • Updated documentation with JavaScript examples for all Whisper features

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/js/src/whisper_pipeline/pipeline_wrapper.cpp C++ N-API wrapper for WhisperPipeline with async generation support
src/js/src/whisper_pipeline/perf_metrics.cpp Performance metrics wrapper for Whisper-specific metrics
src/js/src/whisper_pipeline/init_worker.cpp Async worker for pipeline initialization
src/js/src/helper.cpp Added type conversions for WhisperGenerationConfig, lang_to_id, alignment_heads, and float vectors
src/js/src/addon.cpp Registered WhisperPipeline and WhisperPerfMetrics classes; contains critical bug
src/js/lib/pipelines/whisperPipeline.ts TypeScript WhisperPipeline class with generate() and stream() methods
src/js/lib/utils.ts Added WhisperGenerationConfig type definitions; contains duplicate definitions
src/js/lib/perfMetrics.ts Added WhisperPerfMetrics and WhisperRawMetrics interfaces
src/js/lib/decodedResults.ts Added WhisperDecodedResults, WhisperDecodedResultChunk, and WhisperWordTiming types
src/js/lib/index.ts Exported WhisperPipeline factory and related types
src/js/lib/addon.ts Added WhisperPipeline interface definitions
src/js/tests/whisperPipeline.test.js Comprehensive tests for WhisperPipeline; contains hardcoded path
src/js/tests/generationConfig.test.js Added round-trip tests for WhisperGenerationConfig
src/js/tests/utils.js Added createTestRawSpeech helper function
src/js/tests/setup.py Added WHISPER_MODEL configuration for test model setup
samples/js/whisper_speech_recognition/whisper_speech_recognition.js Sample demonstrating Whisper features (timestamps, language, task)
samples/js/whisper_speech_recognition/README.md Comprehensive documentation for Whisper sample
samples/js/ffmpeg_utils.js Audio decoding utilities using ffmpeg-static
samples/js/package.json Added ffmpeg-static and ffprobe-static dependencies
tests/python_tests/samples/test_whisper_speech_recognition.py Added JS sample to cross-language testing
site/docs/use-cases/speech-recognition/_sections/_usage_options/index.mdx Added JavaScript examples for all Whisper features
site/docs/use-cases/speech-recognition/_sections/_run_model/index.mdx Added JS tab to run model section
src/js/include/* Header files for WhisperPipeline wrapper classes
src/js/eslint.config.cjs Added Whisper-specific snake_case identifiers to camelcase allow list

Copilot AI review requested due to automatic review settings February 26, 2026 12:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

try {
js_callback.Call(
{Napi::Error::New(env, "whisperPerformInferenceThread error. " + message).Value(), env.Null()});
} catch (std::exception& err) {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block uses std::exception& err (non-const reference) instead of const std::exception& err. According to C++ Core Guidelines (E.15), exceptions should be caught by const reference to avoid unnecessary copies and to follow best practices.

Suggested change
} catch (std::exception& err) {
} catch (const std::exception& err) {

Copilot uses AI. Check for mistakes.
context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) {
try {
js_callback.Call({env.Null(), to_whisper_decoded_result(env, result)});
} catch (std::exception& err) {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block uses std::exception& err (non-const reference) instead of const std::exception& err. According to C++ Core Guidelines (E.15), exceptions should be caught by const reference to avoid unnecessary copies and to follow best practices.

Copilot uses AI. Check for mistakes.

it("stream(rawSpeech, options) accepts generation config", async () => {
const chunks = [];
const stream = pipeline.stream(rawSpeech, { language: "<|en|>", task: "transcribe" });
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes { language: "<|en|>", task: "transcribe" } directly to stream(), but according to the type signature in whisperPipeline.ts (lines 67-69), the second parameter should be WhisperGenerateOptions which expects { generationConfig?: WhisperGenerationConfig }. The test should pass { generationConfig: { language: "<|en|>", task: "transcribe" } } instead.

Suggested change
const stream = pipeline.stream(rawSpeech, { language: "<|en|>", task: "transcribe" });
const stream = pipeline.stream(rawSpeech, {
generationConfig: { language: "<|en|>", task: "transcribe" },
});

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +58
async function probeMediaInfo(mediaPath, streamEntries) {
const ffprobePath = ffprobeModule.path;
if (!ffprobePath) {
throw new Error("ffprobe-static binary not found. Ensure ffprobe-static is installed.");
}

const stdout = await spawnChild(ffprobePath, [
"-v", "error",
"-select_streams", "v:0",
"-count_frames",
"-show_entries", `stream=${streamEntries}`,
"-of", "json",
mediaPath,
]);

const parsed = JSON.parse(stdout.toString("utf8"));
return parsed?.streams?.[0] ?? {};
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probeMediaInfo function is defined but never used in this file. It appears to be unused code. According to custom guideline 1000000 item 19, unused functions aren't allowed except for in debug_utils.hpp. Consider removing this function or ensuring it has a purpose.

Copilot generated this review using guidance from repository custom instructions.
import { spawn } from "node:child_process";
import ffmpegPath from "ffmpeg-static";
import ffprobeModule from "ffprobe-static";
import { addon as ov } from "openvino-node";
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import import { addon as ov } from "openvino-node" at line 7 is not used anywhere in this file. Remove unused imports to keep the code clean.

Suggested change
import { addon as ov } from "openvino-node";

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 4, 2026 12:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 8 comments.

Files not reviewed (1)
  • samples/js/package-lock.json: Language not supported

export type Phi4ReasoningParser = IPhi4ReasoningParser;
export type Llama3PythonicToolParser = ILlama3PythonicToolParser;
export type Llama3JsonToolParser = ILlama3JsonToolParser;
export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./utils.js";
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-exporting WhisperGenerationConfig / WhisperPipelineProperties from utils.js here will collide with src/js/lib/index.ts which already does export * from "./utils.js" and export * from "./addon.js". This typically triggers TS2308 (duplicate re-export) during type checking. Prefer exporting these types from a single place (usually utils.ts) and remove this re-export from addon.ts.

Suggested change
export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./utils.js";

Copilot uses AI. Check for mistakes.
} = PipelineFactory;
export { DecodedResults, VLMDecodedResults, WhisperDecodedResults } from "./decodedResults.js";
export type { WhisperDecodedResultChunk, WhisperWordTiming } from "./decodedResults.js";
export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./addon.js";
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WhisperGenerationConfig / WhisperPipelineProperties are already re-exported via export * from "./utils.js" (and also via export * from "./addon.js"). This explicit re-export introduces duplicate/ambiguous exports and may fail TypeScript builds (TS2308). Remove the explicit export type { ... } from "./addon.js" line (and ensure the symbols are exported from only one module).

Suggested change
export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./addon.js";

Copilot uses AI. Check for mistakes.

import { describe, it, before } from "node:test";
import assert from "node:assert/strict";
import { resolve } from "node:path";
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve is imported but never used in this test file. Please remove the unused import to satisfy linting/avoid dead code.

Suggested change
import { resolve } from "node:path";

Copilot uses AI. Check for mistakes.
assert.strictEqual(config.word_timestamps, true);
});

it("generate() without generationConfig doesn't returns chunks but returns words", async () => {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test description has a grammar issue: "doesn't returns" should be "doesn't return".

Suggested change
it("generate() without generationConfig doesn't returns chunks but returns words", async () => {
it("generate() without generationConfig doesn't return chunks but returns words", async () => {

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +38
@@ -26,7 +30,12 @@ function valuesEqual(a, b) {
if (!Array.isArray(b)) return false;
return a.length === b.length && a.every((v, i) => valuesEqual(v, b[i]));
}
return false;
try {
assert.deepStrictEqual(Object.keys(a).sort(), Object.keys(b).sort());
return true;
} catch {
return false;
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valuesEqual() returns true for plain objects as long as their keys match, ignoring the values entirely. This can make the round-trip assertions pass even when the conversion corrupts object/map values (e.g., lang_to_id). Update the object comparison branch to recursively compare each property value (and ensure both objects have the same key set).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 4, 2026 17:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Comment on lines +90 to +97
if (error) {
if (rejectPromise) {
rejectPromise(error);
resolvePromise = null;
rejectPromise = null;
}
return;
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In stream(), if the native generate finishes with an error before the consumer is awaiting next(), the error is silently dropped (because rejectPromise is null) and the iterator can hang forever. Persist the error state (or enqueue an error sentinel) so the next next() rejects, and consider also ending the iterator (done: true) after an error.

Copilot uses AI. Check for mistakes.
report_error(combined_error);
} else {
napi_status status =
context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is captured by value in the final callback_tsfn.BlockingCall, which copies potentially large vectors (texts/scores/metrics) unnecessarily. Capture by reference (since BlockingCall is synchronous) or move the result into the lambda to avoid extra allocations/copies on the inference thread hot path.

Suggested change
context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) {
context->callback_tsfn.BlockingCall([&result, &report_error](Napi::Env env, Napi::Function js_callback) {

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 5, 2026 12:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

const auto& object = value.ToObject();
const auto& keys = object.GetPropertyNames();
for (uint32_t i = 0; i < keys.Length(); ++i) {
const std::string& key_name = keys.Get(i).ToString();
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::string& key_name = keys.Get(i).ToString(); binds a reference to a temporary std::string, so key_name dangles immediately (undefined behavior). Store the string by value instead (e.g. std::string key_name = ...) before using it as an object key / map key. (C++ Core Guidelines: avoid dangling references / lifetime issues, e.g. ES.65).

Suggested change
const std::string& key_name = keys.Get(i).ToString();
std::string key_name = keys.Get(i).ToString();

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +24
private:
std::shared_ptr<ov::genai::WhisperPipeline> pipe = nullptr;
std::shared_ptr<bool> is_initializing = std::make_shared<bool>(false);
std::shared_ptr<bool> is_generating = std::make_shared<bool>(false);
};
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_generating / is_initializing are std::shared_ptr<bool> but are read/written from both the main thread (N-API call) and a worker thread (whisperPerformInferenceThread). This is a C++ data race (undefined behavior). Use an atomic flag (e.g. std::atomic_bool or std::shared_ptr<std::atomic_bool>) or protect access with a mutex. (C++ Core Guidelines: CP.2/CP.8 — avoid data races).

Copilot uses AI. Check for mistakes.
Comment on lines 264 to +270
result_map[key_name] = js_to_cpp<std::vector<std::shared_ptr<ov::genai::Parser>>>(env, value_by_key);
} else if (key_name == STOP_CRITERIA_KEY) {
result_map[key_name] = ov::Any(js_to_cpp<ov::genai::StopCriteria>(env, value_by_key));
} else if (key_name == LANG_TO_ID_KEY) {
result_map[key_name] = js_to_cpp<std::map<std::string, int64_t>>(env, value_by_key);
} else if (key_name == ALIGNMENT_HEADS_KEY) {
result_map[key_name] = js_to_cpp<std::vector<std::pair<size_t, size_t>>>(env, value_by_key);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this property-iteration loop, key_name is currently obtained via const std::string& key_name = keys.Get(i).ToString(); (earlier in the same loop). That binds a reference to a temporary std::string, so key_name is dangling when used in these comparisons and as a map key (undefined behavior). Make key_name a std::string value instead of a reference. (C++ Core Guidelines: avoid dangling references / lifetime issues, e.g. ES.65).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Comment on lines +21 to +24
void WhisperInitWorker::Execute() {
this->pipe = std::make_shared<ov::genai::WhisperPipeline>(std::filesystem::path(this->model_path),
this->device,
this->properties);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_worker.cpp constructs std::filesystem::path but doesn’t include <filesystem>. This works only if a transitive include happens to pull it in; please include <filesystem> directly to avoid fragile builds.

Copilot uses AI. Check for mistakes.
Comment on lines +179 to +182
OPENVINO_ASSERT(this->pipe, "WhisperPipeline is not initialized");
OPENVINO_ASSERT(!*this->is_generating, "Another generate is already in progress");
*this->is_generating = true;

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_generating is a shared_ptr<bool> that is written from the JS thread (in generate()) and from the native worker thread (in whisperPerformInferenceThread). This is a data race (C++ Core Guidelines CP.2) and can lead to undefined behavior. Use std::atomic_bool (or shared_ptr<std::atomic_bool>) and .load()/.store(), or guard updates with a mutex.

Copilot uses AI. Check for mistakes.
resolvePromise = null;
rejectPromise = null;
} else {
throw error;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In stream(), if generate() completes with an error before the consumer has started awaiting next(), rejectPromise will be null and the code throw error; will throw from an async callback context (likely crashing / becoming an unhandled exception). Instead, enqueue the error and make the next next() reject (or store it and reject immediately once next() is awaited).

Suggested change
throw error;
// No pending consumer: enqueue a terminal item instead of throwing
queue.push({ done: true, chunk: "" });

Copilot uses AI. Check for mistakes.

#include <future>
#include <optional>
#include <thread>
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline_wrapper.cpp uses std::cerr/std::endl but does not include <iostream>. Relying on transitive includes is brittle and can break builds when headers change; include the required standard header explicitly.

Suggested change
#include <thread>
#include <thread>
#include <iostream>

Copilot uses AI. Check for mistakes.
@Retribution98 Retribution98 requested a review from Copilot March 5, 2026 16:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

@almilosz almilosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small comments

Copilot AI review requested due to automatic review settings March 6, 2026 13:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.

const generationConfig = options?.generationConfig ?? {};

let streamingStatus = StreamingStatus.RUNNING;
let handledError: Error | null;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handledError is declared but never initialized, and is read in next() before any assignment. TypeScript typically reports this as “variable is used before being assigned”. Initialize it to null (and include undefined in the type only if you intentionally want that state).

Suggested change
let handledError: Error | null;
let handledError: Error | null = null;

Copilot uses AI. Check for mistakes.
@as-suvorov as-suvorov added this pull request to the merge queue Mar 9, 2026
Merged via the queue into openvinotoolkit:master with commit 9d967d9 Mar 9, 2026
151 of 153 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: JS API GenAI JS API category: whisper Whisper pipeline no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants