[JS] Implement WhisperPipeline#3399
Conversation
Signed-off-by: Kirill Suvorov <kirill.suvorov@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR implements WhisperPipeline for JavaScript/Node.js bindings, adding speech recognition capabilities to the openvino-genai-node package. The implementation follows the existing patterns established by VLMPipeline and other pipeline implementations, providing comprehensive support for Whisper models including word-level timestamps, language detection, translation, and streaming.
Changes:
- Added WhisperPipeline implementation with C++ N-API bindings and TypeScript wrapper
- Created JavaScript samples demonstrating Whisper speech recognition with timestamps
- Added comprehensive test coverage for WhisperPipeline API and configuration
- Updated documentation with JavaScript examples for all Whisper features
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js/src/whisper_pipeline/pipeline_wrapper.cpp | C++ N-API wrapper for WhisperPipeline with async generation support |
| src/js/src/whisper_pipeline/perf_metrics.cpp | Performance metrics wrapper for Whisper-specific metrics |
| src/js/src/whisper_pipeline/init_worker.cpp | Async worker for pipeline initialization |
| src/js/src/helper.cpp | Added type conversions for WhisperGenerationConfig, lang_to_id, alignment_heads, and float vectors |
| src/js/src/addon.cpp | Registered WhisperPipeline and WhisperPerfMetrics classes; contains critical bug |
| src/js/lib/pipelines/whisperPipeline.ts | TypeScript WhisperPipeline class with generate() and stream() methods |
| src/js/lib/utils.ts | Added WhisperGenerationConfig type definitions; contains duplicate definitions |
| src/js/lib/perfMetrics.ts | Added WhisperPerfMetrics and WhisperRawMetrics interfaces |
| src/js/lib/decodedResults.ts | Added WhisperDecodedResults, WhisperDecodedResultChunk, and WhisperWordTiming types |
| src/js/lib/index.ts | Exported WhisperPipeline factory and related types |
| src/js/lib/addon.ts | Added WhisperPipeline interface definitions |
| src/js/tests/whisperPipeline.test.js | Comprehensive tests for WhisperPipeline; contains hardcoded path |
| src/js/tests/generationConfig.test.js | Added round-trip tests for WhisperGenerationConfig |
| src/js/tests/utils.js | Added createTestRawSpeech helper function |
| src/js/tests/setup.py | Added WHISPER_MODEL configuration for test model setup |
| samples/js/whisper_speech_recognition/whisper_speech_recognition.js | Sample demonstrating Whisper features (timestamps, language, task) |
| samples/js/whisper_speech_recognition/README.md | Comprehensive documentation for Whisper sample |
| samples/js/ffmpeg_utils.js | Audio decoding utilities using ffmpeg-static |
| samples/js/package.json | Added ffmpeg-static and ffprobe-static dependencies |
| tests/python_tests/samples/test_whisper_speech_recognition.py | Added JS sample to cross-language testing |
| site/docs/use-cases/speech-recognition/_sections/_usage_options/index.mdx | Added JavaScript examples for all Whisper features |
| site/docs/use-cases/speech-recognition/_sections/_run_model/index.mdx | Added JS tab to run model section |
| src/js/include/* | Header files for WhisperPipeline wrapper classes |
| src/js/eslint.config.cjs | Added Whisper-specific snake_case identifiers to camelcase allow list |
| try { | ||
| js_callback.Call( | ||
| {Napi::Error::New(env, "whisperPerformInferenceThread error. " + message).Value(), env.Null()}); | ||
| } catch (std::exception& err) { |
There was a problem hiding this comment.
The catch block uses std::exception& err (non-const reference) instead of const std::exception& err. According to C++ Core Guidelines (E.15), exceptions should be caught by const reference to avoid unnecessary copies and to follow best practices.
| } catch (std::exception& err) { | |
| } catch (const std::exception& err) { |
| context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) { | ||
| try { | ||
| js_callback.Call({env.Null(), to_whisper_decoded_result(env, result)}); | ||
| } catch (std::exception& err) { |
There was a problem hiding this comment.
The catch block uses std::exception& err (non-const reference) instead of const std::exception& err. According to C++ Core Guidelines (E.15), exceptions should be caught by const reference to avoid unnecessary copies and to follow best practices.
src/js/tests/whisperPipeline.test.js
Outdated
|
|
||
| it("stream(rawSpeech, options) accepts generation config", async () => { | ||
| const chunks = []; | ||
| const stream = pipeline.stream(rawSpeech, { language: "<|en|>", task: "transcribe" }); |
There was a problem hiding this comment.
The test passes { language: "<|en|>", task: "transcribe" } directly to stream(), but according to the type signature in whisperPipeline.ts (lines 67-69), the second parameter should be WhisperGenerateOptions which expects { generationConfig?: WhisperGenerationConfig }. The test should pass { generationConfig: { language: "<|en|>", task: "transcribe" } } instead.
| const stream = pipeline.stream(rawSpeech, { language: "<|en|>", task: "transcribe" }); | |
| const stream = pipeline.stream(rawSpeech, { | |
| generationConfig: { language: "<|en|>", task: "transcribe" }, | |
| }); |
samples/js/ffmpeg_utils.js
Outdated
| async function probeMediaInfo(mediaPath, streamEntries) { | ||
| const ffprobePath = ffprobeModule.path; | ||
| if (!ffprobePath) { | ||
| throw new Error("ffprobe-static binary not found. Ensure ffprobe-static is installed."); | ||
| } | ||
|
|
||
| const stdout = await spawnChild(ffprobePath, [ | ||
| "-v", "error", | ||
| "-select_streams", "v:0", | ||
| "-count_frames", | ||
| "-show_entries", `stream=${streamEntries}`, | ||
| "-of", "json", | ||
| mediaPath, | ||
| ]); | ||
|
|
||
| const parsed = JSON.parse(stdout.toString("utf8")); | ||
| return parsed?.streams?.[0] ?? {}; | ||
| } |
There was a problem hiding this comment.
The probeMediaInfo function is defined but never used in this file. It appears to be unused code. According to custom guideline 1000000 item 19, unused functions aren't allowed except for in debug_utils.hpp. Consider removing this function or ensuring it has a purpose.
samples/js/ffmpeg_utils.js
Outdated
| import { spawn } from "node:child_process"; | ||
| import ffmpegPath from "ffmpeg-static"; | ||
| import ffprobeModule from "ffprobe-static"; | ||
| import { addon as ov } from "openvino-node"; |
There was a problem hiding this comment.
The import import { addon as ov } from "openvino-node" at line 7 is not used anywhere in this file. Remove unused imports to keep the code clean.
| import { addon as ov } from "openvino-node"; |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
429120a to
8a65b80
Compare
src/js/lib/addon.ts
Outdated
| export type Phi4ReasoningParser = IPhi4ReasoningParser; | ||
| export type Llama3PythonicToolParser = ILlama3PythonicToolParser; | ||
| export type Llama3JsonToolParser = ILlama3JsonToolParser; | ||
| export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./utils.js"; |
There was a problem hiding this comment.
Re-exporting WhisperGenerationConfig / WhisperPipelineProperties from utils.js here will collide with src/js/lib/index.ts which already does export * from "./utils.js" and export * from "./addon.js". This typically triggers TS2308 (duplicate re-export) during type checking. Prefer exporting these types from a single place (usually utils.ts) and remove this re-export from addon.ts.
| export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./utils.js"; |
src/js/lib/index.ts
Outdated
| } = PipelineFactory; | ||
| export { DecodedResults, VLMDecodedResults, WhisperDecodedResults } from "./decodedResults.js"; | ||
| export type { WhisperDecodedResultChunk, WhisperWordTiming } from "./decodedResults.js"; | ||
| export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./addon.js"; |
There was a problem hiding this comment.
WhisperGenerationConfig / WhisperPipelineProperties are already re-exported via export * from "./utils.js" (and also via export * from "./addon.js"). This explicit re-export introduces duplicate/ambiguous exports and may fail TypeScript builds (TS2308). Remove the explicit export type { ... } from "./addon.js" line (and ensure the symbols are exported from only one module).
| export type { WhisperGenerationConfig, WhisperPipelineProperties } from "./addon.js"; |
src/js/tests/whisperPipeline.test.js
Outdated
|
|
||
| import { describe, it, before } from "node:test"; | ||
| import assert from "node:assert/strict"; | ||
| import { resolve } from "node:path"; |
There was a problem hiding this comment.
resolve is imported but never used in this test file. Please remove the unused import to satisfy linting/avoid dead code.
| import { resolve } from "node:path"; |
src/js/tests/whisperPipeline.test.js
Outdated
| assert.strictEqual(config.word_timestamps, true); | ||
| }); | ||
|
|
||
| it("generate() without generationConfig doesn't returns chunks but returns words", async () => { |
There was a problem hiding this comment.
Test description has a grammar issue: "doesn't returns" should be "doesn't return".
| it("generate() without generationConfig doesn't returns chunks but returns words", async () => { | |
| it("generate() without generationConfig doesn't return chunks but returns words", async () => { |
| @@ -26,7 +30,12 @@ function valuesEqual(a, b) { | |||
| if (!Array.isArray(b)) return false; | |||
| return a.length === b.length && a.every((v, i) => valuesEqual(v, b[i])); | |||
| } | |||
| return false; | |||
| try { | |||
| assert.deepStrictEqual(Object.keys(a).sort(), Object.keys(b).sort()); | |||
| return true; | |||
| } catch { | |||
| return false; | |||
| } | |||
There was a problem hiding this comment.
valuesEqual() returns true for plain objects as long as their keys match, ignoring the values entirely. This can make the round-trip assertions pass even when the conversion corrupts object/map values (e.g., lang_to_id). Update the object comparison branch to recursively compare each property value (and ensure both objects have the same key set).
| if (error) { | ||
| if (rejectPromise) { | ||
| rejectPromise(error); | ||
| resolvePromise = null; | ||
| rejectPromise = null; | ||
| } | ||
| return; | ||
| } |
There was a problem hiding this comment.
In stream(), if the native generate finishes with an error before the consumer is awaiting next(), the error is silently dropped (because rejectPromise is null) and the iterator can hang forever. Persist the error state (or enqueue an error sentinel) so the next next() rejects, and consider also ending the iterator (done: true) after an error.
| report_error(combined_error); | ||
| } else { | ||
| napi_status status = | ||
| context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) { |
There was a problem hiding this comment.
result is captured by value in the final callback_tsfn.BlockingCall, which copies potentially large vectors (texts/scores/metrics) unnecessarily. Capture by reference (since BlockingCall is synchronous) or move the result into the lambda to avoid extra allocations/copies on the inference thread hot path.
| context->callback_tsfn.BlockingCall([result, &report_error](Napi::Env env, Napi::Function js_callback) { | |
| context->callback_tsfn.BlockingCall([&result, &report_error](Napi::Env env, Napi::Function js_callback) { |
src/js/src/helper.cpp
Outdated
| const auto& object = value.ToObject(); | ||
| const auto& keys = object.GetPropertyNames(); | ||
| for (uint32_t i = 0; i < keys.Length(); ++i) { | ||
| const std::string& key_name = keys.Get(i).ToString(); |
There was a problem hiding this comment.
const std::string& key_name = keys.Get(i).ToString(); binds a reference to a temporary std::string, so key_name dangles immediately (undefined behavior). Store the string by value instead (e.g. std::string key_name = ...) before using it as an object key / map key. (C++ Core Guidelines: avoid dangling references / lifetime issues, e.g. ES.65).
| const std::string& key_name = keys.Get(i).ToString(); | |
| std::string key_name = keys.Get(i).ToString(); |
| private: | ||
| std::shared_ptr<ov::genai::WhisperPipeline> pipe = nullptr; | ||
| std::shared_ptr<bool> is_initializing = std::make_shared<bool>(false); | ||
| std::shared_ptr<bool> is_generating = std::make_shared<bool>(false); | ||
| }; |
There was a problem hiding this comment.
is_generating / is_initializing are std::shared_ptr<bool> but are read/written from both the main thread (N-API call) and a worker thread (whisperPerformInferenceThread). This is a C++ data race (undefined behavior). Use an atomic flag (e.g. std::atomic_bool or std::shared_ptr<std::atomic_bool>) or protect access with a mutex. (C++ Core Guidelines: CP.2/CP.8 — avoid data races).
| result_map[key_name] = js_to_cpp<std::vector<std::shared_ptr<ov::genai::Parser>>>(env, value_by_key); | ||
| } else if (key_name == STOP_CRITERIA_KEY) { | ||
| result_map[key_name] = ov::Any(js_to_cpp<ov::genai::StopCriteria>(env, value_by_key)); | ||
| } else if (key_name == LANG_TO_ID_KEY) { | ||
| result_map[key_name] = js_to_cpp<std::map<std::string, int64_t>>(env, value_by_key); | ||
| } else if (key_name == ALIGNMENT_HEADS_KEY) { | ||
| result_map[key_name] = js_to_cpp<std::vector<std::pair<size_t, size_t>>>(env, value_by_key); |
There was a problem hiding this comment.
In this property-iteration loop, key_name is currently obtained via const std::string& key_name = keys.Get(i).ToString(); (earlier in the same loop). That binds a reference to a temporary std::string, so key_name is dangling when used in these comparisons and as a map key (undefined behavior). Make key_name a std::string value instead of a reference. (C++ Core Guidelines: avoid dangling references / lifetime issues, e.g. ES.65).
| void WhisperInitWorker::Execute() { | ||
| this->pipe = std::make_shared<ov::genai::WhisperPipeline>(std::filesystem::path(this->model_path), | ||
| this->device, | ||
| this->properties); |
There was a problem hiding this comment.
init_worker.cpp constructs std::filesystem::path but doesn’t include <filesystem>. This works only if a transitive include happens to pull it in; please include <filesystem> directly to avoid fragile builds.
| OPENVINO_ASSERT(this->pipe, "WhisperPipeline is not initialized"); | ||
| OPENVINO_ASSERT(!*this->is_generating, "Another generate is already in progress"); | ||
| *this->is_generating = true; | ||
|
|
There was a problem hiding this comment.
is_generating is a shared_ptr<bool> that is written from the JS thread (in generate()) and from the native worker thread (in whisperPerformInferenceThread). This is a data race (C++ Core Guidelines CP.2) and can lead to undefined behavior. Use std::atomic_bool (or shared_ptr<std::atomic_bool>) and .load()/.store(), or guard updates with a mutex.
| resolvePromise = null; | ||
| rejectPromise = null; | ||
| } else { | ||
| throw error; |
There was a problem hiding this comment.
In stream(), if generate() completes with an error before the consumer has started awaiting next(), rejectPromise will be null and the code throw error; will throw from an async callback context (likely crashing / becoming an unhandled exception). Instead, enqueue the error and make the next next() reject (or store it and reject immediately once next() is awaited).
| throw error; | |
| // No pending consumer: enqueue a terminal item instead of throwing | |
| queue.push({ done: true, chunk: "" }); |
|
|
||
| #include <future> | ||
| #include <optional> | ||
| #include <thread> |
There was a problem hiding this comment.
pipeline_wrapper.cpp uses std::cerr/std::endl but does not include <iostream>. Relying on transitive includes is brittle and can break builds when headers change; include the required standard header explicitly.
| #include <thread> | |
| #include <thread> | |
| #include <iostream> |
| const generationConfig = options?.generationConfig ?? {}; | ||
|
|
||
| let streamingStatus = StreamingStatus.RUNNING; | ||
| let handledError: Error | null; |
There was a problem hiding this comment.
handledError is declared but never initialized, and is read in next() before any assignment. TypeScript typically reports this as “variable is used before being assigned”. Initialize it to null (and include undefined in the type only if you intentionally want that state).
| let handledError: Error | null; | |
| let handledError: Error | null = null; |
9d967d9
Description
CVS-179940
Checklist: