x10 slower in release build

Running Whisper for transcribing in a release build seems to take approximately x10 as long as a debug build. I tested this with the `audio_transcription.rs` example using the `jfk.wav` file and the `ggml-medium.en-q5_0.bin` model.

<details>
<summary>In Debug</summary>

```
PS C:\Repositories\whisper-rs> Measure-Command { cargo run --example audio_transcription }
   Compiling whisper-rs v0.14.2 (C:\Repositories\whisper-rs)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s                                                                                                                                                                                                                                            
     Running `target\debug\examples\audio_transcription.exe`
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.en-q5_0.bin'
whisper_init_with_params_no_state: use gpu    = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 1
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 8
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   538.59 MB
whisper_model_load: model size    =  538.59 MB
whisper_init_state: kv self size  =   50.33 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: alignment heads masks size = 1152 B
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  170.15 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  395.07 MB

whisper_full_with_state: strategy = 0, decoding with 1 decoders, temperature = 0.00


whisper_full_with_state: prompt[0] = [_SOT_]


whisper_full_with_state: id =   0, decoder = 0, token =  50363, p =  0.825, ts =    [_BEG_],  0.825, result_len =    0 '[_BEG_]'
whisper_full_with_state: id =   1, decoder = 0, token =    843, p =  0.648, ts =        [?],  0.000, result_len =    0 ' And'
whisper_full_with_state: id =   2, decoder = 0, token =    523, p =  0.992, ts =        [?],  0.009, result_len =    0 ' so'
whisper_full_with_state: id =   3, decoder = 0, token =    616, p =  0.653, ts =        [?],  0.006, result_len =    0 ' my'
whisper_full_with_state: id =   4, decoder = 0, token =   5891, p =  0.997, ts =        [?],  0.006, result_len =    0 ' fellow'
whisper_full_with_state: id =   5, decoder = 0, token =   3399, p =  0.912, ts =        [?],  0.056, result_len =    0 ' Americans'
whisper_full_with_state: id =   6, decoder = 0, token =     11, p =  0.562, ts =        [?],  0.048, result_len =    0 ','
whisper_full_with_state: id =   7, decoder = 0, token =   1265, p =  0.852, ts =        [?],  0.044, result_len =    0 ' ask'
whisper_full_with_state: id =   8, decoder = 0, token =    407, p =  0.896, ts =        [?],  0.087, result_len =    0 ' not'
whisper_full_with_state: id =   9, decoder = 0, token =    644, p =  0.911, ts =        [?],  0.048, result_len =    0 ' what'
whisper_full_with_state: id =  10, decoder = 0, token =    534, p =  0.982, ts =  [_TT_278],  0.164, result_len =    0 ' your'
whisper_full_with_state: id =  11, decoder = 0, token =   1499, p =  0.979, ts =        [?],  0.087, result_len =    0 ' country'
whisper_full_with_state: id =  12, decoder = 0, token =    460, p =  0.988, ts =        [?],  0.080, result_len =    0 ' can'
whisper_full_with_state: id =  13, decoder = 0, token =    466, p =  0.996, ts =  [_TT_332],  0.105, result_len =    0 ' do'
whisper_full_with_state: id =  14, decoder = 0, token =    329, p =  0.995, ts =  [_TT_340],  0.111, result_len =    0 ' for'
whisper_full_with_state: id =  15, decoder = 0, token =    345, p =  0.988, ts =        [?],  0.086, result_len =    0 ' you'
whisper_full_with_state: id =  16, decoder = 0, token =     11, p =  0.542, ts =        [?],  0.088, result_len =    0 ','
whisper_full_with_state: id =  17, decoder = 0, token =   1265, p =  0.623, ts =        [?],  0.067, result_len =    0 ' ask'
whisper_full_with_state: id =  18, decoder = 0, token =    644, p =  0.988, ts =        [?],  0.087, result_len =    0 ' what'
whisper_full_with_state: id =  19, decoder = 0, token =    345, p =  0.950, ts =  [_TT_437],  0.203, result_len =    0 ' you'
whisper_full_with_state: id =  20, decoder = 0, token =    460, p =  0.939, ts =  [_TT_450],  0.301, result_len =    0 ' can'
whisper_full_with_state: id =  21, decoder = 0, token =    466, p =  0.989, ts =  [_TT_466],  0.146, result_len =    0 ' do'
whisper_full_with_state: id =  22, decoder = 0, token =    329, p =  0.776, ts =  [_TT_484],  0.148, result_len =    0 ' for'
whisper_full_with_state: id =  23, decoder = 0, token =    534, p =  0.990, ts =  [_TT_488],  0.137, result_len =    0 ' your'
whisper_full_with_state: id =  24, decoder = 0, token =   1499, p =  0.995, ts =  [_TT_500],  0.153, result_len =    0 ' country'
whisper_full_with_state: id =  25, decoder = 0, token =     13, p =  0.775, ts =        [?],  0.053, result_len =    0 '.'
whisper_full_with_state: id =  26, decoder = 0, token =  50913, p =  0.148, ts =  [_TT_550],  0.148, result_len =   27 '[_TT_550]'
whisper_full_with_state: decoder 0 completed
whisper_full_with_state: decoder  0: score = -0.21573, result_len =  27, avg_logprobs = -0.21573, entropy =  2.83374
whisper_full_with_state: best decoder = 0
single timestamp ending - skip entire chunk
seek = 1099, seek_delta = 1099


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 13
Milliseconds      : 367
Ticks             : 133674667
TotalDays         : 0.000154716049768519
TotalHours        : 0.00371318519444444
TotalMinutes      : 0.222791111666667
TotalSeconds      : 13.3674667
TotalMilliseconds : 13367.4667
```
</details>


<details>
<summary>In Release</summary>

```
PS C:\Repositories\whisper-rs> Measure-Command { cargo run --example audio_transcription --release }
   Compiling whisper-rs v0.14.2 (C:\Repositories\whisper-rs)
    Finished `release` profile [optimized] target(s) in 1.02s                                                                                                                                                                                                                                                      
     Running `target\release\examples\audio_transcription.exe`
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.en-q5_0.bin'
whisper_init_with_params_no_state: use gpu    = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 1
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 9 5950X 16-Core Processor            )
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 8
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   538.59 MB
whisper_model_load: model size    =  538.59 MB
whisper_init_state: kv self size  =   50.33 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: alignment heads masks size = 1152 B
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  170.15 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  395.07 MB


Days              : 0
Hours             : 0
Minutes           : 2
Seconds           : 48
Milliseconds      : 70
Ticks             : 1680705152
TotalDays         : 0.00194526059259259
TotalHours        : 0.0466862542222222
TotalMinutes      : 2.80117525333333
TotalSeconds      : 168.0705152
TotalMilliseconds : 168070.5152
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

x10 slower in release build #226

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

x10 slower in release build #226

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions