PermissionError and Segmentation Fault with torch.compile (Inductor) during Model Warm-up when --complie is set

### Self Checks

- [x] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions).
- [x] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. [English](https://speech.fish.audio/) [中文](https://speech.fish.audio/zh/) [日本語](https://speech.fish.audio/ja/) [Portuguese (Brazil)](https://speech.fish.audio/pt/)
- [x] I have searched for existing issues, including closed ones. [Search issues](https://github.com/fishaudio/fish-speech/issues)
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/fishaudio/fish-speech/issues/515)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
- [x] Please do not modify this template and fill in all required fields.

### Cloud or Self Hosted

Self Hosted (Source)

### Environment Details


*   **OS:** WSL Ubuntu  22.04
*   **Python Version:** 3.10.16
*   **PyTorch Version:**  * 2.4.1+cu121
*   **CUDA Version:**
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
*   **GPU:** NVIDIA GeForce RTX 3060
*   **Fish-Speech  Version:** 1.5
* **torch version**: 2.4.1
* **torchvision version** : 0.19.1+cu121
* **torchaudio version** 0.19.1+cu121
* **torchtext version**: 2.4.1+cu121



### Steps to Reproduce




I started up successfully before with --complie and it works fine but dont why this doesnt work this time:

1.  Set up the fish-speech environment (including dependencies).
2.  Run the API server with the `--compile` flag:

    ```bash
    python -m tools.api_server \
        --listen 0.0.0.0:7865 \
        --llama-checkpoint-path checkpoints/fish-speech-1.5 \
        --decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \
        --decoder-config-name firefly_gan_vq \
        --compile
    ```


### ✔️ Expected Behavior

Application start up successfully

### ❌ Actual Behavior

I'm encountering a `PermissionError` followed by a Segmentation Fault when running a TTS model (fish-speech) with `torch.compile` using the Inductor backend. The error occurs during the model warm-up phase (`model_manager.warm_up`). The issue seems related to file access permissions in the temporary directory used by TorchInductor, even after attempts to adjust permissions and use `shutil.move` as a workaround which referred from https://github.com/unslothai/unsloth/issues/1999. 

**Error Log:**

### run as user:
```
(tts) koma@LAPTOP-UFED71OD:~/fish-speech$ python -m tools.api_server --listen 0.0.0.0:7865 --llama-checkpoint-path checkpoints/fish-speech-1.5 --decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth --decoder-config-name firefly_gan_vq --compile
INFO:     Started server process [1330]
INFO:     Waiting for application startup.
2025-03-23 11:28:20.578 | INFO     | fish_speech.models.text2semantic.inference:load_model:681 - Restored model from checkpoint
2025-03-23 11:28:20.578 | INFO     | fish_speech.models.text2semantic.inference:load_model:687 - Using DualARTransformer
2025-03-23 11:28:20.578 | INFO     | fish_speech.models.text2semantic.inference:load_model:695 - Compiling function...
2025-03-23 11:28:22.062 | INFO     | tools.server.model_manager:load_llama_model:99 - LLAMA model loaded.
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2025-03-23 11:28:27.765 | INFO     | fish_speech.models.vqgan.inference:load_model:46 - Loaded model: <All keys matched successfully>
2025-03-23 11:28:27.766 | INFO     | tools.server.model_manager:load_decoder_model:107 - Decoder model loaded.
2025-03-23 11:28:27.795 | INFO     | fish_speech.models.text2semantic.inference:generate_long:788 - Encoded text: Hello world.
2025-03-23 11:28:27.797 | INFO     | fish_speech.models.text2semantic.inference:generate_long:806 - Generating sentence 1/1 of sample 1/1
  0%|                                                                                  | 0/1023 [00:00<?, ?it/s]/home/koma/.conda/envs/tts/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
W0323 11:30:52.749000 140116945266240 torch/fx/experimental/symbolic_shapes.py:4449] [0/0] xindex is not in var_ranges, defaulting to unknown range.
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ttir.tmp.pid_1465_a4aea7d5-c578-456b-a4e9-f2c3298911a3 -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ttir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ttgir.tmp.pid_1465_15dc99d5-a380-438c-b1da-c05e1e98490f -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ttgir
Segmentation fault (core dumped) /tmp/torchinductor_koma/lo/.1330.140116945266240.tmp -> /tmp/torchinductor_koma/lo/clogj3r7bsakyus6wz3yqefmjlhto65qemf2qfe4ns7mp52pxd6n.py
  0%|                                                                                  | 0/1023 [02:41<?, ?it/s]
ERROR:    Traceback (most recent call last):
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/home/koma/fish-speech/tools/api_server.py", line 82, in initialize_app
    app.state.model_manager = ModelManager(
  File "/home/koma/fish-speech/tools/server/model_manager.py", line 65, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/home/koma/fish-speech/tools/server/model_manager.py", line 121, in warm_up
    list(inference(request, tts_inference_engine))
  File "/home/koma/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (500, '\'backend=\\\'inductor\\\' raised:\\nPermissionError: [Errno 13] Permission denied: \\\'/tmp/torchinductor_koma/lo/.1330.140116945266240.tmp\\\' -> \\\'/tmp/torchinductor_koma/lo/clogj3r7bsakyus6wz3yqefmjlhto65qemf2qfe4ns7mp52pxd6n.py\\\'\\n\\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\\n\\n\\nYou can suppress this exception and fall back to eager by setting:\\n    import torch._dynamo\\n    torch._dynamo.config.suppress_errors = True\\n\'')

Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.llir.tmp.pid_1465_7cc0f06c-2609-4d20-a45e-ae625a161c39 -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.llir
ERROR:    Application startup failed. Exiting.
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ptx.tmp.pid_1465_aaab7612-5f9d-4669-bf5e-1f685a9eb20a -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.ptx
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.cubin.tmp.pid_1465_7e66aa04-7c5f-4624-a304-84f1c6297651 -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.cubin
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.json.tmp.pid_1465_c426a30f-3d7c-4382-ae50-3281c8bfc682 -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/__grp__triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.json.tmp.pid_1465_3c4ed8ff-5f40-46aa-bad2-06ea8a68ff97 -> /tmp/torchinductor_koma/triton/0/e6989009f1faa05ac5f7f5257fe2b85e8fead7015504203974aa7e1e4d122d7c/__grp__triton_red_fused__softmax_add_bmm_index_logical_not_masked_fill_zeros_like_8.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttir.tmp.pid_1465_8b7a4a8b-f4fe-4a5d-b211-dcb7b802a116 -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttgir.tmp.pid_1465_09c6789f-443a-4516-8364-6ed1dca9a77e -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttgir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttir.tmp.pid_1463_63dde67e-2ce1-4689-9dd6-201faa09fbbd -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttgir.tmp.pid_1463_937aec16-747f-4c7d-ad14-03e631c2c8fb -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttgir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.llir.tmp.pid_1465_c3a159f6-b382-4c1a-a151-ad6ec86f6930 -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.llir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ptx.tmp.pid_1465_b79ce8db-924e-43b3-8d93-efc4e1b15264 -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ptx
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.llir.tmp.pid_1463_ab90bde9-214a-4ff5-a32a-3d6a7ef50ced -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.llir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ptx.tmp.pid_1463_7a0e89a4-53ff-4fd9-835b-1a8e88d92ee6 -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ptx
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.cubin.tmp.pid_1463_2b7ccc2a-537c-4bc7-9c26-1428a6db4caa -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.cubin
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json.tmp.pid_1463_f222d114-66c1-4eca-a03a-0bb33e9bd35d -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json.tmp.pid_1463_739c28d6-8df0-4417-b096-a23454ab5497 -> /tmp/torchinductor_koma/triton/0/3874684ef2364a59aab56e08e3ed32022564fa8382c2943dc87961046f181f38/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.cubin.tmp.pid_1465_65ecf108-ed86-4aca-83ff-afb63f98f017 -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.cubin
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json.tmp.pid_1465_87b5dcad-2b6f-4cf1-93ef-7e3748fde4cc -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json.tmp.pid_1465_82aa0069-a027-4475-be22-dcd0977f4a30 -> /tmp/torchinductor_koma/triton/0/bb6ccef264892b035b94d267793ba6f1e10d66cfd971c9abaca345ffab4a88d0/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttir.tmp.pid_1463_7efde184-dd6f-445f-bbbe-be559b7db26f -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttgir.tmp.pid_1463_e4bcbad9-34eb-4fe8-a5ef-990efcb22576 -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ttgir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttir.tmp.pid_1465_0fac9c6a-a91a-4b3c-ba58-42e239e93e9f -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttgir.tmp.pid_1465_bc0f6074-6542-4b19-a7e6-0d09e123720f -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ttgir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.llir.tmp.pid_1463_1c8ec382-65e8-4066-b82c-ccdfec37d317 -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.llir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ptx.tmp.pid_1463_a6b78858-ec93-4366-b195-adadf53f3575 -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.ptx
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.llir.tmp.pid_1465_6ca938da-2740-4e83-9180-7f9e2199d222 -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.llir
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ptx.tmp.pid_1465_abae5c09-5d18-4d8b-ad58-52b30fc1b2b9 -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.ptx
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.cubin.tmp.pid_1463_058654c1-bc84-4853-84e0-46b42ac6931c -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.cubin
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json.tmp.pid_1463_eeb3c09f-052e-422e-ad6e-981dc43232d4 -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json.tmp.pid_1463_10e1f3f6-e8af-4ce6-b46a-68061ee4355c -> /tmp/torchinductor_koma/triton/0/1cc3eba260b69750ae0548a7dc165b3318610eb430423fac402263c1ac4d2109/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_46.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.cubin.tmp.pid_1465_9e493fbf-9a1f-4b56-b7b6-261a2b6bead1 -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.cubin
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json.tmp.pid_1465_49237752-09ac-450d-b429-970e476f09af -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json
Segmentation fault (core dumped) /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json.tmp.pid_1465_75cf45bf-0969-43d4-b9f1-c2107e319764 -> /tmp/torchinductor_koma/triton/0/edb0032f3eae2bf52320059fa36d9885afbf10dfab09fe554572050365c84131/__grp__triton_per_fused__softmax_add_index_logical_not_masked_fill_mul_zeros_41.json
```
### run with sudo
```
(tts) koma@DESKTOP-O5IAPM2:~/fish-speech$ sudo /home/koma/.conda/envs/tts/bin/python  -m tools.api_server --listen 0.0.0.0:7865 --llama-checkpoint-path checkpoints/fish-speech-1.5 --decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth --decoder-config-name firefly_gan_vq --compile
INFO:     Started server process [8528]
INFO:     Waiting for application startup.
2025-03-23 10:45:56.124 | INFO     | fish_speech.models.text2semantic.inference:load_model:681 - Restored model from checkpoint
2025-03-23 10:45:56.124 | INFO     | fish_speech.models.text2semantic.inference:load_model:687 - Using DualARTransformer
2025-03-23 10:45:56.124 | INFO     | fish_speech.models.text2semantic.inference:load_model:695 - Compiling function...
2025-03-23 10:45:56.842 | INFO     | tools.server.model_manager:load_llama_model:99 - LLAMA model loaded.
Traceback (most recent call last):
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/torch/_inductor/compile_worker/__main__.py", line 7, in <module>
    from torch._inductor.async_compile import pre_fork_setup
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/torch/__init__.py", line 2263, in <module>
    _logging._init_logs()
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/torch/_logging/_internal.py", line 884, in _init_logs
    _update_log_state_from_env()
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/torch/_logging/_internal.py", line 716, in _update_log_state_from_env
    log_state = _parse_log_settings(log_setting)
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/torch/_logging/_internal.py", line 660, in _parse_log_settings
    raise ValueError(_invalid_settings_err_msg(settings))
ValueError:
Invalid log settings: torch._dynamo=DEBUG, must be a comma separated list of fully
qualified module names, registered log names or registered artifact names.
For more info on various settings, try TORCH_LOGS="help"
Valid settings:
all, dynamo, aot, autograd, inductor, dynamic, torch, distributed, c10d, ddp, pp, fsdp, onnx, export, aot_graphs, graph_sizes, bytecode, graph_code, not_implemented, custom_format_test_artifact, graph_breaks, cudagraphs, kernel_code, fusion, recompiles, output_code, onnx_diagnostics, recompiles_verbose, trace_bytecode, compiled_autograd, schedule, trace_source, overlap, perf_hints, trace_call, sym_node, ddp_graphs, verbose_guards, graph, compiled_autograd_verbose, guards, aot_joint_graph, post_grad_graphs

/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/home/koma/.conda/envs/tts/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2025-03-23 10:45:57.498 | INFO     | fish_speech.models.vqgan.inference:load_model:46 - Loaded model: <All keys matched successfully>
2025-03-23 10:45:57.499 | INFO     | tools.server.model_manager:load_decoder_model:107 - Decoder model loaded.
2025-03-23 10:45:57.511 | INFO     | fish_speech.models.text2semantic.inference:generate_long:788 - Encoded text: Hello world.
2025-03-23 10:45:57.511 | INFO     | fish_speech.models.text2semantic.inference:generate_long:806 - Generating sentence 1/1 of sample 1/1
  0%|                                                                                                                                                              | 0/1023 [00:00<?, ?it/s]/home/koma/.conda/envs/tts/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
W0323 10:46:39.124000 139948787234368 torch/fx/experimental/symbolic_shapes.py:4449] [0/0] xindex is not in var_ranges, defaulting to unknown range.
Segmentation fault (core dumped) /tmp/torchinductor_root/q7/.8528.139948787234368.tmp -> /tmp/torchinductor_root/q7/cq7aqs2ot34rpqjm36euezlogdt6eptsfb2ihhipmgx4f3prrecf.py
  0%|                                                                                                                                                              | 0/1023 [00:47<?, ?it/s]
ERROR:    Traceback (most recent call last):
  File "/home/koma/.conda/envs/tts/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/home/koma/fish-speech/tools/api_server.py", line 100, in initialize_app
    app.state.model_manager = ModelManager(
  File "/home/koma/fish-speech/tools/server/model_manager.py", line 65, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/home/koma/fish-speech/tools/server/model_manager.py", line 121, in warm_up
    list(inference(request, tts_inference_engine))
  File "/home/koma/fish-speech/tools/server/inference.py", line 36, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (500, '\'backend=\\\'inductor\\\' raised:\\nPermissionError: [Errno 13] Permission denied: \\\'/tmp/torchinductor_root/q7/.8528.139948787234368.tmp\\\' -> \\\'/tmp/torchinductor_root/q7/cq7aqs2ot34rpqjm36euezlogdt6eptsfb2ihhipmgx4f3prrecf.py\\\'\\n\\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\\n\'')

ERROR:    Application startup failed. Exiting.
```
 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PermissionError and Segmentation Fault with torch.compile (Inductor) during Model Warm-up when --complie is set #932

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

run as user:

run with sudo

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PermissionError and Segmentation Fault with torch.compile (Inductor) during Model Warm-up when --complie is set #932

Description

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

run as user:

run with sudo

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions