This repository was archived by the owner on Jul 30, 2025. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 159
This repository was archived by the owner on Jul 30, 2025. It is now read-only.
x10 slower in release build #226
Copy link
Copy link
Open
Description
Running Whisper for transcribing in a release build seems to take approximately x10 as long as a debug build. I tested this with the audio_transcription.rs example using the jfk.wav file and the ggml-medium.en-q5_0.bin model.
In Debug
PS C:\Repositories\whisper-rs> Measure-Command { cargo run --example audio_transcription }
Compiling whisper-rs v0.14.2 (C:\Repositories\whisper-rs)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s
Running `target\debug\examples\audio_transcription.exe`
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.en-q5_0.bin'
whisper_init_with_params_no_state: use gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 1
whisper_init_with_params_no_state: devices = 1
whisper_init_with_params_no_state: backends = 1
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 8
whisper_model_load: qntvr = 1
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 538.59 MB
whisper_model_load: model size = 538.59 MB
whisper_init_state: kv self size = 50.33 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: alignment heads masks size = 1152 B
whisper_init_state: compute buffer (conv) = 28.55 MB
whisper_init_state: compute buffer (encode) = 170.15 MB
whisper_init_state: compute buffer (cross) = 7.72 MB
whisper_init_state: compute buffer (decode) = 395.07 MB
whisper_full_with_state: strategy = 0, decoding with 1 decoders, temperature = 0.00
whisper_full_with_state: prompt[0] = [_SOT_]
whisper_full_with_state: id = 0, decoder = 0, token = 50363, p = 0.825, ts = [_BEG_], 0.825, result_len = 0 '[_BEG_]'
whisper_full_with_state: id = 1, decoder = 0, token = 843, p = 0.648, ts = [?], 0.000, result_len = 0 ' And'
whisper_full_with_state: id = 2, decoder = 0, token = 523, p = 0.992, ts = [?], 0.009, result_len = 0 ' so'
whisper_full_with_state: id = 3, decoder = 0, token = 616, p = 0.653, ts = [?], 0.006, result_len = 0 ' my'
whisper_full_with_state: id = 4, decoder = 0, token = 5891, p = 0.997, ts = [?], 0.006, result_len = 0 ' fellow'
whisper_full_with_state: id = 5, decoder = 0, token = 3399, p = 0.912, ts = [?], 0.056, result_len = 0 ' Americans'
whisper_full_with_state: id = 6, decoder = 0, token = 11, p = 0.562, ts = [?], 0.048, result_len = 0 ','
whisper_full_with_state: id = 7, decoder = 0, token = 1265, p = 0.852, ts = [?], 0.044, result_len = 0 ' ask'
whisper_full_with_state: id = 8, decoder = 0, token = 407, p = 0.896, ts = [?], 0.087, result_len = 0 ' not'
whisper_full_with_state: id = 9, decoder = 0, token = 644, p = 0.911, ts = [?], 0.048, result_len = 0 ' what'
whisper_full_with_state: id = 10, decoder = 0, token = 534, p = 0.982, ts = [_TT_278], 0.164, result_len = 0 ' your'
whisper_full_with_state: id = 11, decoder = 0, token = 1499, p = 0.979, ts = [?], 0.087, result_len = 0 ' country'
whisper_full_with_state: id = 12, decoder = 0, token = 460, p = 0.988, ts = [?], 0.080, result_len = 0 ' can'
whisper_full_with_state: id = 13, decoder = 0, token = 466, p = 0.996, ts = [_TT_332], 0.105, result_len = 0 ' do'
whisper_full_with_state: id = 14, decoder = 0, token = 329, p = 0.995, ts = [_TT_340], 0.111, result_len = 0 ' for'
whisper_full_with_state: id = 15, decoder = 0, token = 345, p = 0.988, ts = [?], 0.086, result_len = 0 ' you'
whisper_full_with_state: id = 16, decoder = 0, token = 11, p = 0.542, ts = [?], 0.088, result_len = 0 ','
whisper_full_with_state: id = 17, decoder = 0, token = 1265, p = 0.623, ts = [?], 0.067, result_len = 0 ' ask'
whisper_full_with_state: id = 18, decoder = 0, token = 644, p = 0.988, ts = [?], 0.087, result_len = 0 ' what'
whisper_full_with_state: id = 19, decoder = 0, token = 345, p = 0.950, ts = [_TT_437], 0.203, result_len = 0 ' you'
whisper_full_with_state: id = 20, decoder = 0, token = 460, p = 0.939, ts = [_TT_450], 0.301, result_len = 0 ' can'
whisper_full_with_state: id = 21, decoder = 0, token = 466, p = 0.989, ts = [_TT_466], 0.146, result_len = 0 ' do'
whisper_full_with_state: id = 22, decoder = 0, token = 329, p = 0.776, ts = [_TT_484], 0.148, result_len = 0 ' for'
whisper_full_with_state: id = 23, decoder = 0, token = 534, p = 0.990, ts = [_TT_488], 0.137, result_len = 0 ' your'
whisper_full_with_state: id = 24, decoder = 0, token = 1499, p = 0.995, ts = [_TT_500], 0.153, result_len = 0 ' country'
whisper_full_with_state: id = 25, decoder = 0, token = 13, p = 0.775, ts = [?], 0.053, result_len = 0 '.'
whisper_full_with_state: id = 26, decoder = 0, token = 50913, p = 0.148, ts = [_TT_550], 0.148, result_len = 27 '[_TT_550]'
whisper_full_with_state: decoder 0 completed
whisper_full_with_state: decoder 0: score = -0.21573, result_len = 27, avg_logprobs = -0.21573, entropy = 2.83374
whisper_full_with_state: best decoder = 0
single timestamp ending - skip entire chunk
seek = 1099, seek_delta = 1099
Days : 0
Hours : 0
Minutes : 0
Seconds : 13
Milliseconds : 367
Ticks : 133674667
TotalDays : 0.000154716049768519
TotalHours : 0.00371318519444444
TotalMinutes : 0.222791111666667
TotalSeconds : 13.3674667
TotalMilliseconds : 13367.4667
In Release
PS C:\Repositories\whisper-rs> Measure-Command { cargo run --example audio_transcription --release }
Compiling whisper-rs v0.14.2 (C:\Repositories\whisper-rs)
Finished `release` profile [optimized] target(s) in 1.02s
Running `target\release\examples\audio_transcription.exe`
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.en-q5_0.bin'
whisper_init_with_params_no_state: use gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 1
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 9 5950X 16-Core Processor )
whisper_init_with_params_no_state: devices = 1
whisper_init_with_params_no_state: backends = 1
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 8
whisper_model_load: qntvr = 1
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 538.59 MB
whisper_model_load: model size = 538.59 MB
whisper_init_state: kv self size = 50.33 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: alignment heads masks size = 1152 B
whisper_init_state: compute buffer (conv) = 28.55 MB
whisper_init_state: compute buffer (encode) = 170.15 MB
whisper_init_state: compute buffer (cross) = 7.72 MB
whisper_init_state: compute buffer (decode) = 395.07 MB
Days : 0
Hours : 0
Minutes : 2
Seconds : 48
Milliseconds : 70
Ticks : 1680705152
TotalDays : 0.00194526059259259
TotalHours : 0.0466862542222222
TotalMinutes : 2.80117525333333
TotalSeconds : 168.0705152
TotalMilliseconds : 168070.5152
Metadata
Metadata
Assignees
Labels
No labels