Describe the bug
On Windows, torch.compile(..., mode="reduce-overhead") does not throw during model loading but silently generates code paths that depend on Triton at inference time. Since Triton has no official Windows support, TTS generation fails with:
TTS engine stopped mid-generation. This usually means it ran out of memory.
Underlying error: Cannot find a working triton installation.
To Reproduce
- Run OmniVoice-Studio on Windows with CUDA
- The model loads successfully and logs
torch.compile applied.
- Attempt TTS generation
- Generation fails immediately with OOM / Triton error
Root Cause
torch.compile with mode="reduce-overhead" uses Triton CUDA kernels internally
- Triton is not available on Windows (no official support from OpenAI/PTX team)
- The
try/except in _load_model_sync() only catches compile-time errors, but the failure happens at runtime during inference
- This leads to a confusing error that looks like an OOM, masking the actual Triton dependency issue
Expected Behavior
On platforms without Triton (Windows), torch.compile should be skipped gracefully so TTS inference falls back to eager mode.
Environment
- OS: Windows
- GPU: Any CUDA-capable
- PyTorch: any recent version
- Triton: not installed / not available
Suggested Fix
Check for Triton availability before calling torch.compile:
import importlib
if importlib.util.find_spec("triton") is not None:
_model.llm = torch.compile(_model.llm, mode="reduce-overhead")
Describe the bug
On Windows,
torch.compile(..., mode="reduce-overhead")does not throw during model loading but silently generates code paths that depend on Triton at inference time. Since Triton has no official Windows support, TTS generation fails with:To Reproduce
torch.compile applied.Root Cause
torch.compilewithmode="reduce-overhead"uses Triton CUDA kernels internallytry/exceptin_load_model_sync()only catches compile-time errors, but the failure happens at runtime during inferenceExpected Behavior
On platforms without Triton (Windows),
torch.compileshould be skipped gracefully so TTS inference falls back to eager mode.Environment
Suggested Fix
Check for Triton availability before calling
torch.compile: