Skip to content

Commit 689be5f

Browse files
committed
perf(windows): skip language auto-detection on CPU-only platforms
Whisper auto-detection runs the encoder twice (detect + transcribe), doubling latency on CPU. On macOS Metal GPU this is negligible but on Windows it adds ~3s to every transcription. When the user has speech languages configured, use the first one directly instead of auto-detect. The LLM correction layer handles any cross-language artifacts. Before: 13.3s (auto, 4t) → After: ~4s (fixed lang, 12t)
1 parent 132584e commit 689be5f

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

src/main/audio/whisper.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,17 @@ export async function transcribe(
3636

3737
const whisperArgs = buildWhisperArgs(speechLanguages);
3838
const prompt = buildWhisperPrompt(dictionary, whisperArgs.promptPrefix);
39-
const stdout = await runWhisper(modelPath, tempPath, prompt, whisperArgs.language, temperature);
39+
// On CPU-only platforms (Windows/Linux), language auto-detection runs the
40+
// encoder twice — once to detect, once to transcribe — doubling latency.
41+
// Use the first configured language instead; the LLM correction layer
42+
// handles any cross-language artifacts. On macOS Metal GPU the overhead
43+
// is negligible so we keep auto-detect for accuracy.
44+
const language = whisperArgs.language === "auto"
45+
&& process.platform !== "darwin"
46+
&& speechLanguages.length > 0
47+
? speechLanguages[0]
48+
: whisperArgs.language;
49+
const stdout = await runWhisper(modelPath, tempPath, prompt, language, temperature);
4050
const text = parseWhisperOutput(stdout);
4151

4252
return { text };

0 commit comments

Comments
 (0)