-
-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Whenever I use voice mode / live chat the app either freezes or crashes. I tried medium and small model and the issue seams to repeat. I am on alpaca 8.4.4 version and its runing on ubuntu based distro. Prior to crash it 100% util. my 10 core xeon cpu and made my pc come to a crawl.
Here is the crazy shit terminal produced :
RuntimeError: The size of tensor a (3) must match the size of tensor b (0) at non-singleton dimension 1
Exception in thread Thread-615 (recognize_audio):
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "/usr/lib/python3.13/threading.py", line 994, in run
self._target(*self._args, **self._kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/share/Alpaca/alpaca/widgets/voice.py", line 185, in recognize_audio
result = model.transcribe(audio_data, language=language)
File "/app/lib/python3.13/site-packages/whisper/transcribe.py", line 295, in transcribe
result: DecodingResult = decode_with_fallback(mel_segment)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/whisper/transcribe.py", line 201, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/app/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 824, in decode
result = DecodingTask(model, options).run(mel)
File "/app/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 737, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 687, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 163, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/model.py", line 242, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/model.py", line 169, in forward
x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/model.py", line 109, in forward
v = kv_cache[self.value]
~~~~~~~~^^^^^^^^^^^^
KeyError: Linear(in_features=1024, out_features=1024, bias=True)
File "/usr/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "/usr/lib/python3.13/threading.py", line 994, in run
self._target(*self._args, **self._kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/share/Alpaca/alpaca/widgets/voice.py", line 185, in recognize_audio
result = model.transcribe(audio_data, language=language)
File "/app/lib/python3.13/site-packages/whisper/transcribe.py", line 295, in transcribe
result: DecodingResult = decode_with_fallback(mel_segment)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/whisper/transcribe.py", line 201, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/app/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 824, in decode
result = DecodingTask(model, options).run(mel)
File "/app/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 737, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 687, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/app/lib/python3.13/site-packages/whisper/decoding.py", line 163, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/app/lib/python3.13/site-packages/whisper/model.py", line 236, in forward
self.token_embedding(x)
~~~~~~~~~~~~~~~~~~~~~~~
+ self.positional_embedding[offset : offset + x.shape[-1]]
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~