Skip to content

[Bug] torch.compile on Windows causes TTS OOM due to missing Triton #65

@SummerSec

Description

@SummerSec

Describe the bug

On Windows, torch.compile(..., mode="reduce-overhead") does not throw during model loading but silently generates code paths that depend on Triton at inference time. Since Triton has no official Windows support, TTS generation fails with:

TTS engine stopped mid-generation. This usually means it ran out of memory.
Underlying error: Cannot find a working triton installation.

To Reproduce

  1. Run OmniVoice-Studio on Windows with CUDA
  2. The model loads successfully and logs torch.compile applied.
  3. Attempt TTS generation
  4. Generation fails immediately with OOM / Triton error

Root Cause

  • torch.compile with mode="reduce-overhead" uses Triton CUDA kernels internally
  • Triton is not available on Windows (no official support from OpenAI/PTX team)
  • The try/except in _load_model_sync() only catches compile-time errors, but the failure happens at runtime during inference
  • This leads to a confusing error that looks like an OOM, masking the actual Triton dependency issue

Expected Behavior

On platforms without Triton (Windows), torch.compile should be skipped gracefully so TTS inference falls back to eager mode.

Environment

  • OS: Windows
  • GPU: Any CUDA-capable
  • PyTorch: any recent version
  • Triton: not installed / not available

Suggested Fix

Check for Triton availability before calling torch.compile:

import importlib
if importlib.util.find_spec("triton") is not None:
    _model.llm = torch.compile(_model.llm, mode="reduce-overhead")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions