Open
Description
Self Checks
- This template is only for bug reports. For questions, please visit Discussions.
- I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- I have searched for existing issues, including closed ones. Search issues
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Docker)
Environment Details
Tesla T4
Steps to Reproduce
"python", "-m", "tools.api", \
"--listen", "0.0.0.0:8080", \
"--llama-checkpoint-path", "checkpoints/fish-speech-1.4", \
"--decoder-checkpoint-path", "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth", \
"--decoder-config-name", "firefly_gan_vq", \
"--compile", \
"--half" \
✔️ Expected Behavior
When input is short like hi
, he
the correct audio should be generated, stably.
❌ Actual Behavior
It randomly succeeds or fails. Error message is AssertionError: Negative code found
2025-01-04 15:59:09.779 | INFO | tools.llama.generate:generate_long:759 - Encoded text: hi.
2025-01-04 15:59:09.779 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
1%| | 11/1023 [00:00<00:11, 87.47it/s]
2025-01-04 15:59:09.989 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.21 seconds
2025-01-04 15:59:09.989 | INFO | tools.llama.generate:generate_long:832 - Generated 13 tokens in 0.21 seconds, 62.07 tokens/sec
2025-01-04 15:59:09.989 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 30.69 GB/s
2025-01-04 15:59:09.990 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/kui/asgi/exceptions.py", line 27, in wrapper
return await endpoint()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/views.py", line 29, in wrapper
return await function()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/parameters.py", line 119, in callback_with_auto_bound_params
result = await result
^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 756, in api_invoke_model
fake_audios = next(inference(req))
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 683, in inference
raise result.response
File "/opt/fish-speech/tools/llama/generate.py", line 904, in worker
for chunk in generate_long(
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/llama/generate.py", line 848, in generate_long
assert (codes >= 0).all(), f"Negative code found"
^^^^^^^^^^^^^^^^^^
AssertionError: Negative code found
INFO: 10.0.3.136:35684 - "POST /v1/tts HTTP/1.1" 500 Internal Server Error
2025-01-04 15:59:16.891 | INFO | tools.api:inference:623 - Use same references
2025-01-04 15:59:16.894 | INFO | tools.llama.generate:generate_long:759 - Encoded text: he.
2025-01-04 15:59:16.894 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
0%| | 2/1023 [00:00<00:15, 64.00it/s]
2025-01-04 15:59:17.010 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.12 seconds
2025-01-04 15:59:17.010 | INFO | tools.llama.generate:generate_long:832 - Generated 4 tokens in 0.12 seconds, 34.65 tokens/sec
2025-01-04 15:59:17.010 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 17.13 GB/s
2025-01-04 15:59:17.011 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/kui/asgi/exceptions.py", line 27, in wrapper
return await endpoint()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/views.py", line 29, in wrapper
return await function()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/parameters.py", line 119, in callback_with_auto_bound_params
result = await result
^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 756, in api_invoke_model
fake_audios = next(inference(req))
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 683, in inference
raise result.response
File "/opt/fish-speech/tools/llama/generate.py", line 904, in worker
for chunk in generate_long(
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/llama/generate.py", line 848, in generate_long
assert (codes >= 0).all(), f"Negative code found"
^^^^^^^^^^^^^^^^^^
AssertionError: Negative code found
INFO: 10.0.3.136:45164 - "POST /v1/tts HTTP/1.1" 500 Internal Server Error
2025-01-04 15:59:21.064 | INFO | tools.api:inference:623 - Use same references
2025-01-04 15:59:21.066 | INFO | tools.llama.generate:generate_long:759 - Encoded text: Hee
2025-01-04 15:59:21.067 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
1%| | 12/1023 [00:00<00:11, 87.44it/s]
2025-01-04 15:59:21.289 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.22 seconds
2025-01-04 15:59:21.289 | INFO | tools.llama.generate:generate_long:832 - Generated 14 tokens in 0.22 seconds, 63.06 tokens/sec
2025-01-04 15:59:21.289 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 31.18 GB/s
2025-01-04 15:59:21.290 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/kui/asgi/exceptions.py", line 27, in wrapper
return await endpoint()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/views.py", line 29, in wrapper
return await function()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/kui/asgi/parameters.py", line 119, in callback_with_auto_bound_params
result = await result
^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 756, in api_invoke_model
fake_audios = next(inference(req))
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/api.py", line 683, in inference
raise result.response
File "/opt/fish-speech/tools/llama/generate.py", line 904, in worker
for chunk in generate_long(
^^^^^^^^^^^^^^
File "/opt/fish-speech/tools/llama/generate.py", line 848, in generate_long
assert (codes >= 0).all(), f"Negative code found"
^^^^^^^^^^^^^^^^^^
AssertionError: Negative code found
INFO: 10.0.3.136:45178 - "POST /v1/tts HTTP/1.1" 500 Internal Server Error
2025-01-04 16:01:01.233 | INFO | tools.api:inference:623 - Use same references
2025-01-04 16:01:01.236 | INFO | tools.llama.generate:generate_long:759 - Encoded text: what
2025-01-04 16:01:01.236 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
1%|▏ | 15/1023 [00:00<00:11, 89.25it/s]
2025-01-04 16:01:01.488 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.25 seconds
2025-01-04 16:01:01.489 | INFO | tools.llama.generate:generate_long:832 - Generated 17 tokens in 0.25 seconds, 67.47 tokens/sec
2025-01-04 16:01:01.489 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 33.36 GB/s
2025-01-04 16:01:01.489 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
2025-01-04 16:01:01.490 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 16])
INFO: 10.0.3.136:52978 - "POST /v1/tts HTTP/1.1" 200 OK
2025-01-04 16:01:10.114 | INFO | tools.api:inference:623 - Use same references
2025-01-04 16:01:10.116 | INFO | tools.llama.generate:generate_long:759 - Encoded text: Oh .
2025-01-04 16:01:10.117 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
2%|▏ | 18/1023 [00:00<00:11, 89.46it/s]
2025-01-04 16:01:10.403 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.29 seconds
2025-01-04 16:01:10.403 | INFO | tools.llama.generate:generate_long:832 - Generated 20 tokens in 0.29 seconds, 69.87 tokens/sec
2025-01-04 16:01:10.404 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 34.55 GB/s
2025-01-04 16:01:10.404 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
2025-01-04 16:01:10.405 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 19])
INFO: 10.0.3.136:52980 - "POST /v1/tts HTTP/1.1" 200 OK
2025-01-04 16:01:22.751 | INFO | tools.api:inference:623 - Use same references
2025-01-04 16:01:22.754 | INFO | tools.llama.generate:generate_long:759 - Encoded text: Hi.
2025-01-04 16:01:22.755 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
2%|▏ | 24/1023 [00:00<00:11, 90.80it/s]
2025-01-04 16:01:23.104 | INFO | tools.llama.generate:generate_long:823 - Compilation time: 0.35 seconds
2025-01-04 16:01:23.104 | INFO | tools.llama.generate:generate_long:832 - Generated 26 tokens in 0.35 seconds, 74.49 tokens/sec
2025-01-04 16:01:23.104 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 36.83 GB/s
2025-01-04 16:01:23.105 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 2.43 GB
2025-01-04 16:01:23.106 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 25])
INFO: 10.0.3.136:46966 - "POST /v1/tts HTTP/1.1" 200 OK