Description
Self Checks
- This template is only for bug reports. For questions, please visit Discussions.
- I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- I have searched for existing issues, including closed ones. Search issues
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Official environment configuration
Steps to Reproduce
uvicorn workers=4
When performing inference with multiple texts (for example, 50,000 texts), the following error always occurs.
-----generate_long-----
llama-236 ------generate------
llama-186 num_new_tokens:1023
0%| | 0/1023 [00:00<?, ?it/s]
2%|▏ | 25/1023 [00:00<00:04, 249.48it/s]
5%|▍ | 50/1023 [00:00<00:03, 249.70it/s]
7%|▋ | 68/1023 [00:00<00:03, 246.03it/s]
2025-04-04 23:07:17.245 | INFO | tools.llama.generate:generate_long:507 - Compilation time: 0.35 seconds
2025-04-04 23:07:17.245 | INFO | tools.llama.generate:generate_long:516 - Generated 70 tokens in 0.35 seconds, 197.90 tokens/sec
2025-04-04 23:07:17.246 | INFO | tools.llama.generate:generate_long:519 - Bandwidth achieved: 97.86 GB/s
2025-04-04 23:07:17.246 | INFO | tools.llama.generate:generate_long:524 - GPU Memory used: 3.34 GB
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [67,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 259, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrap
await func()
File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 232, in listen_for_disconnect
message = await receive()
File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/root/miniconda3/envs/fish142/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f95185496f0
During handling of the above exception, another exception occurred:
- Exception Group Traceback (most recent call last):
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
| return await self.app(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/applications.py", line 113, in call
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 252, in call
| async with anyio.create_task_group() as task_group:
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/data/fish_speech/fish-speech-1.4.1/tools/main.py", line 656, in inference
| fake_audios = decode_vq_tokens(
| File "/data/fish_speech/fish-speech-1.4.1/tools/main.py", line 357, in decode_vq_tokens
| return decoder_model.decode(
| File "/data/fish_speech/fish-speech-1.4.1/fish_speech/models/vqgan/modules/firefly.py", line 582, in decode
| z = self.quantizer.decode(indices) * mel_masks_float_conv
| File "/data/fish_speech/fish-speech-1.4.1/fish_speech/models/vqgan/modules/fsq.py", line 114, in decode
| z_q = self.residual_fsq.get_output_from_indices(indices)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/vector_quantize_pytorch/residual_fsq.py", line 248, in get_output_from_indices
| outputs = tuple(rvq.get_output_from_indices(chunk_indices) for rvq, chunk_indices in zip(self.rvqs, indices))
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/vector_quantize_pytorch/residual_fsq.py", line 248, in
| outputs = tuple(rvq.get_output_from_indices(chunk_indices) for rvq, chunk_indices in zip(self.rvqs, indices))
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/vector_quantize_pytorch/residual_fsq.py", line 134, in get_output_from_indices
| codes = self.get_codes_from_indices(indices)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/vector_quantize_pytorch/residual_fsq.py", line 116, in get_codes_from_indices
| all_codes = get_at('q [c] d, b n q -> q b n d', self.codebooks, indices)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/einx/traceback_util.py", line 71, in func_with_reraise
| raise e.with_traceback(tb) from None
| File "", line 3, in op1
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 480, in _flat_vmap
| batched_outputs = func(*batched_inputs, **kwargs)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 48, in fn
| return f(*args, **kwargs)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 331, in vmap_impl
| return _flat_vmap(
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/apis.py", line 201, in wrapped
| return vmap_impl(
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 480, in _flat_vmap
| batched_outputs = func(*batched_inputs, **kwargs)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 48, in fn
| return f(*args, **kwargs)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 331, in vmap_impl
| return _flat_vmap(
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/_functorch/apis.py", line 201, in wrapped
| return vmap_impl(
| File "", line 10, in op0
| RuntimeError: CUDA error: device-side assert triggered
| Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
|
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrap
| await func()
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/responses.py", line 244, in stream_response
| async for chunk in self.body_iterator:
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/concurrency.py", line 62, in iterate_in_threadpool
| yield await anyio.to_thread.run_sync(_next, as_iterator)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
| return await future
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
| result = context.run(func, *args)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/starlette/concurrency.py", line 51, in _next
| return next(iterator)
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 57, in generator_context
| response = gen.send(request)
| File "/data/fish_speech/fish-speech-1.4.1/tools/main.py", line 733, in inference
| torch.cuda.empty_cache()
| File "/root/miniconda3/envs/fish142/lib/python3.10/site-packages/torch/cuda/memory.py", line 170, in empty_cache
| torch._C._cuda_emptyCache()
| RuntimeError: CUDA error: device-side assert triggered
| Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
|
+------------------------------------
✔️ Expected Behavior
How can we solve this bug?
❌ Actual Behavior
No response