Skip to content

fix(tts): /generate honors the selected TTS engine (#312)#324

Merged
debpalash merged 2 commits into
mainfrom
fix/312-generate-engine
Jun 11, 2026
Merged

fix(tts): /generate honors the selected TTS engine (#312)#324
debpalash merged 2 commits into
mainfrom
fix/312-generate-engine

Conversation

@debpalash

@debpalash debpalash commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

POST /generate always ran the OmniVoice model directly via get_model(), silently ignoring the engine selected in Settings (and answering issue #312's question: no, it wasn't intended). It now:

  • resolves the active backend per request: explicit engine form field > OMNIVOICE_TTS_BACKEND env > Settings selection > OmniVoice default — the same pattern /ws/tts (engine) and /v1/audio/speech (model) already use
  • keeps the OmniVoice default path byte-identical (native path with the full advanced parameter surface: t_shift, layer/position/class controls) — existing API consumers see zero change
  • reuses the per-process engine instance cache shared with the health-check route, so weights load once
  • keeps inline [pause Nms] markers ([Feature] Pause in Transcript #276) working on every engine (silence stitching is model-free)
  • honors applies_own_mastering so studio engines skip the broadcast mastering chain (closing the TODO left by 5d602c8), while loudness normalization still applies
  • unknown/unavailable engines return an actionable 400 listing valid ids

Fixes #312

Tests

tests/test_generate_engine.py (new, 6 tests, engine layer stubbed as in test_api.py): Settings-selected engine generates; per-request engine overrides; unknown → 400; unavailable → 400; default path unchanged; own-mastering engines skip the chain. Full file passes locally; CI runs the complete suite.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-request TTS engine override and engine-aware inference path.
    • Language normalization (e.g., "auto" → undetermined) and consistent sample-rate handling.
    • Shared audio effect chain with optional raw passthrough and conditional mastering when engines apply their own mastering.
  • Bug Fixes

    • Clearer errors for unknown or unavailable engines and improved out-of-memory error handling.
  • Tests

    • Added tests covering engine selection, overrides, availability errors, and mastering behavior.

The /generate route always ran the OmniVoice model directly, ignoring both
the Settings engine selection and any per-request override. It now resolves
the active backend (env var > Settings selection > default), supports an
explicit `engine` form field (same pattern as /ws/tts and /v1/audio/speech),
reuses the per-process engine instance cache, keeps inline [pause Nms]
markers working on every engine, and honors applies_own_mastering so studio
engines skip the broadcast mastering chain. The OmniVoice default path is
byte-identical to the old behavior — existing API consumers see no change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b4313ba3-f99c-4892-a158-e56b81762e9b

📥 Commits

Reviewing files that changed from the base of the PR and between 3c01403 and bffe12b.

📒 Files selected for processing (1)
  • tests/test_generate_engine.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_generate_engine.py

📝 Walkthrough

Walkthrough

Extended POST /generate to support per-request TTS engine selection via an optional engine parameter. Refactored audio post-processing into reusable _apply_effect_chain() and _oom_friendly_reraise() helpers. Implemented engine-aware inference path that routes requests through pluggable backend adapters while preserving OmniVoice as the default. Updated output handling to use the resolved backend or model sample rate consistently. Added comprehensive test coverage validating engine selection, error handling, backward compatibility, and mastering-skip behavior.

Changes

Engine-aware /generate endpoint with refactored audio processing

Layer / File(s) Summary
Audio DSP refactoring and error handling
backend/api/routers/generation.py
Extracted post-processing into _apply_effect_chain() that validates effect presets, supports raw passthrough, conditionally applies mastering, and normalizes loudness. Introduced _oom_friendly_reraise() for standardized out-of-memory error handling with cache cleanup. Updated _run_inference to delegate DSP logic to the shared effect chain.
Backend-aware inference implementation
backend/api/routers/generation.py
Added _run_backend_inference() to execute generation through pluggable backend adapters: sets seed, normalizes "auto" language to None, parses pause markers to stitch span audio, generates via the backend, and applies DSP effects with mastering conditionally skipped based on backend's applies_own_mastering flag.
Engine selection and resolution in /generate
backend/api/routers/generation.py
Extended endpoint to accept optional engine form field. Implemented resolution logic that determines engine ID from parameter or settings default, validates against backend registry (HTTP 400 for unknown engines), checks availability (HTTP 400 with reason if unavailable), and loads either native OmniVoice model or cached backend instance.
Generation execution branching and output normalization
backend/api/routers/generation.py
Updated main execution to conditionally route: runs _run_backend_inference() for adapters or _run_inference() for OmniVoice. Resolved sample rate from backend or model. Updated watermark embedding, file saving, and WAV serialization to use resolved sample rate consistently instead of assuming OmniVoice's fixed rate.
Test fixtures and setup
tests/test_generate_engine.py
Established deterministic test environment with OmniVoice settings and created reusable fixtures: _make_fake_engine() helper that dynamically defines stub TTSBackend subclasses with configurable availability and mastering behavior; client FastAPI TestClient wrapper; and no_omnivoice_model to prevent fallback to OmniVoice path during tests.
Engine selection and override behavior validation
tests/test_generate_engine.py
Validated that /generate honors Settings-selected engine (with "Auto" language passed as None to engine), allows per-request engine form field to override settings, and rejects unknown engines with HTTP 400 and clear error message.
Error handling, availability validation, and backward compatibility
tests/test_generate_engine.py
Validated that unavailable engines return HTTP 400 with unavailability reason (engine not invoked), confirmed backward-compatible default behavior still runs OmniVoice native path with x-audio-id header, and verified mastering-skip behavior: apply_mastering not called for engines with applies_own_mastering=True but called for regular engines.

Sequence Diagram

sequenceDiagram
  participant Client
  participant GenerateEndpoint
  participant Settings
  participant BackendRegistry
  participant OmniVoiceModel
  participant BackendAdapter
  participant AudioDSP

  Client->>GenerateEndpoint: POST /generate (optional engine)
  GenerateEndpoint->>Settings: Resolve engine ID (or use default)
  GenerateEndpoint->>BackendRegistry: Validate engine exists
  alt Unknown engine
    GenerateEndpoint-->>Client: HTTP 400 (unknown)
  else
    GenerateEndpoint->>BackendRegistry: Check engine availability
    alt Unavailable engine
      GenerateEndpoint-->>Client: HTTP 400 (unavailable)
    else
      alt Engine is OmniVoice (or default)
        GenerateEndpoint->>OmniVoiceModel: Load model
        OmniVoiceModel->>OmniVoiceModel: Generate audio
      else
        GenerateEndpoint->>BackendAdapter: Load cached instance
        BackendAdapter->>BackendAdapter: Generate audio
      end
      GenerateEndpoint->>AudioDSP: Apply effect chain (skip mastering if backend applies it)
      GenerateEndpoint->>GenerateEndpoint: Watermark + serialize with resolved sample_rate
      GenerateEndpoint-->>Client: HTTP 200 (audio WAV)
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • debpalash/OmniVoice-Studio#311: Implements the applies_own_mastering backend flag that the main PR uses to conditionally skip the shared broadcast DSP mastering chain.
  • debpalash/OmniVoice-Studio#109: Extends the generation pipeline with effect_preset and raw passthrough options that the main PR refactors into the shared _apply_effect_chain() helper.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: /generate endpoint now honors the selected TTS engine, directly addressing the issue #312.
Description check ✅ Passed The description covers the Summary section well, lists key changes, identifies the type as a bug fix, mentions testing approach, and includes issue reference, though the template checklist itself is not filled in.
Linked Issues check ✅ Passed The PR fully addresses issue #312 by implementing engine resolution with proper precedence, maintaining backward compatibility, and adding comprehensive tests for engine selection behavior.
Out of Scope Changes check ✅ Passed All changes are scoped to resolving engine selection in /generate and related test infrastructure; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/312-generate-engine

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +174 to +178
def _run_backend_inference(
backend, text, language, ref_audio_path, ref_text, instruct, duration,
num_step, guidance_scale, speed, denoise, postprocess_output,
used_seed, effect_preset="broadcast",
):
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes issue #312 where POST /generate always routed to the OmniVoice model regardless of the engine selected in Settings. The engine resolution now follows the same priority chain (engine form field → OMNIVOICE_TTS_BACKEND env → Settings selection → OmniVoice default) used by /ws/tts and /v1/audio/speech, while keeping the OmniVoice native path byte-identical for existing API consumers.

  • _apply_effect_chain and _oom_friendly_reraise are extracted as shared helpers; a new _run_backend_inference function handles the pluggable-engine path, honoring applies_own_mastering to skip broadcast mastering for studio engines.
  • Two issues flagged in prior review threads remain open: sr = backend.sample_rate is read before generation in _run_backend_inference (could produce wrong pause timing and DSP calibration for lazy-loading backends), and _oom_friendly_reraise raises without from e, dropping the original traceback.
  • Six new tests cover engine selection, per-request override, unknown/unavailable 400s, the unchanged OmniVoice default path, and mastering chain gating.

Confidence Score: 4/5

Safe to merge for the primary goal of engine routing; two known deficiencies in _run_backend_inference (sample rate read order, exception chain) are carried forward from prior review threads and should be addressed as follow-ups.

The engine routing logic is well-structured and mirrors the existing pattern on the other two TTS endpoints. The OmniVoice native path is provably unchanged. The two open items from prior threads are real defects on the backend inference path but scoped to non-OmniVoice engines.

backend/api/routers/generation.py — specifically _run_backend_inference's early sample_rate read (line 200) and _oom_friendly_reraise's exception chain (line 109), both flagged in prior review threads.

Important Files Changed

Filename Overview
backend/api/routers/generation.py Adds engine-aware routing to /generate: resolves engine via form field > env var > Settings > OmniVoice default, extracts _apply_effect_chain and _oom_friendly_reraise helpers, and introduces _run_backend_inference for non-OmniVoice paths. Two open issues (sample rate read before lazy load, bare RuntimeError raise) were flagged in prior review threads.
tests/test_generate_engine.py New test module covering 6 cases: Settings-selected engine runs, per-request override, unknown/unavailable 400s, OmniVoice default path unchanged, applies_own_mastering skips mastering chain. Uses fresh stub classes per call and run-time module resolution to avoid fixture pollution; instance-cache isolation depends on _get_engine_instance keying by class identity rather than engine id string.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[POST /generate] --> B{engine form field?}
    B -->|yes| C[engine_id = engine]
    B -->|no| D[engine_id = active_backend_id\nenv > prefs > default]
    C --> E[get_backend_class engine_id]
    D --> E
    E -->|ValueError| F[400 Unknown engine]
    E --> G{backend_cls is\nOmniVoiceBackend?}
    G -->|yes| H[get_model\nnative path]
    G -->|no| I{is_available?}
    I -->|no| J[400 Engine unavailable\nwith masked reason]
    I -->|yes| K[_get_engine_instance\nper-process cache]
    H --> L[run_in_executor\n_run_inference\nfull advanced params]
    K --> M[run_in_executor\n_run_backend_inference\nadapter protocol]
    L --> N[_apply_effect_chain\nskip_mastering=False]
    M --> O{applies_own_mastering?}
    O -->|yes| P[_apply_effect_chain\nskip_mastering=True]
    O -->|no| N
    N --> Q[embed_watermark\nsample_rate after generation]
    P --> Q
    Q --> R[save WAV + DB insert\nStreamingResponse]
Loading

Fix All in Claude Code

Reviews (2): Last reviewed commit: "test(312): resolve modules at run time, ..." | Re-trigger Greptile

Comment on lines +196 to +220
language=language, ref_audio=ref_audio_path, ref_text=ref_text,
instruct=instruct, num_step=num_step, guidance_scale=guidance_scale,
speed=speed, denoise=denoise, postprocess_output=postprocess_output,
)
sr = backend.sample_rate

# Inline [pause Nms] markers (issue #276) work for every engine — the
# silence stitching is model-free.
from omnivoice.utils.text import parse_pause_markers
segments = parse_pause_markers(text)
has_pause = len(segments) > 1 or (segments and segments[0][1] > 0)

if has_pause:
def _gen_span(span_text):
# Per-span duration is left to the engine; an explicit overall
# `duration` can't be meaningfully split across spans.
return backend.generate(span_text, duration=None, **gen_kwargs)
audio_out = _render_with_pauses(_gen_span, segments, sr)
else:
audio_out = backend.generate(text, duration=duration, **gen_kwargs)

return _apply_effect_chain(
audio_out, sr, effect_preset,
skip_mastering=getattr(backend, "applies_own_mastering", False),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sample rate captured before lazy model load

sr = backend.sample_rate is read before any call to backend.generate(). For backends with lazy-loading sample rates — SherpaOnnxBackend returns a hardcoded placeholder 22050 before its ONNX model is loaded, and CosyVoiceBackend/MLXAudioBackend similarly use placeholder values — both the silence-stitching calculation in _render_with_pauses (pause-marker path) and the DSP effects in _apply_effect_chain receive the wrong sample rate on the first request before weights are loaded. A 500 ms pause with a placeholder rate of 22050 but a true model rate of 24000 would produce ~459 ms of silence; more critically, the broadcast mastering chain (compressor, reverb) would be calibrated against the wrong rate.

The generate_speech caller is already aware of this: its comment "Read after generation: engines with lazy model loading report their real rate only once weights are up" is exactly why sample_rate = _backend.sample_rate is read after the executor returns. The same deferred read needs to happen inside _run_backend_inference — at minimum, move sr = backend.sample_rate to after the audio_out = backend.generate(...) call on the non-pause path so _apply_effect_chain always gets the true rate.

Fix in Claude Code

Comment thread backend/api/routers/generation.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/api/routers/generation.py (2)

262-271: ⚡ Quick win

Preserve exception chain for better debugging.

The HTTPException should be raised with from err to preserve the exception chain, aiding debugging when an unknown engine is provided.

Proposed fix
-    except ValueError:
+    except ValueError as err:
         raise HTTPException(
             status_code=400,
             detail=(
                 f"Unknown TTS engine: {engine_id!r}. "
                 "See GET /engines/tts for the list of valid engine ids."
             ),
-        )
+        ) from err
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 262 - 271, The code catches
ValueError from get_backend_class(engine_id) but re-raises HTTPException without
preserving the original exception chain; update the except block to capture the
ValueError (e.g. except ValueError as err:) and re-raise the HTTPException using
"raise HTTPException(... ) from err" so the original traceback is preserved;
keep the same status_code and detail message and reference get_backend_class,
engine_id, HTTPException, and ValueError when making the change.

99-112: ⚡ Quick win

Add NoReturn type hint to silence static analysis warning.

This function always raises but lacks a type annotation. Static analysis tools (CodeQL) flag implicit returns in callers (_run_inference, _run_backend_inference) because they can't infer that this function never returns.

Proposed fix
+from typing import NoReturn
+
-def _oom_friendly_reraise(e):
+def _oom_friendly_reraise(e: Exception) -> NoReturn:
     """Best-effort cache flush + the user-facing OOM hint shared by both
     inference paths."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 99 - 112, The
_oom_friendly_reraise helper always raises but lacks a return-type annotation,
causing static analysis to complain in callers like _run_inference and
_run_backend_inference; fix it by importing NoReturn from typing (or
typing_extensions if you prefer) and annotate the function signature as def
_oom_friendly_reraise(e) -> NoReturn: so analyzers know it never returns,
leaving the body unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/api/routers/generation.py`:
- Around line 200-220: The pause-marker stitching uses sr = backend.sample_rate
before the backend may lazily load and change its sample_rate, causing incorrect
silence durations; fix by ensuring the backend is fully loaded before capturing
sr: call the backend's load routine (e.g., backend._ensure_loaded() or the
appropriate ensure/load method) at the start of _run_backend_inference (before
sr = backend.sample_rate and before any generate calls) so _render_with_pauses
and subsequent saving/watermarking use the same, final sample_rate;
alternatively, if no ensure method exists, defer computing sr until after the
first span generation (use the sample_rate reported after backend.generate in
_gen_span) and pass that same sr into _render_with_pauses and into
_apply_effect_chain.

---

Nitpick comments:
In `@backend/api/routers/generation.py`:
- Around line 262-271: The code catches ValueError from
get_backend_class(engine_id) but re-raises HTTPException without preserving the
original exception chain; update the except block to capture the ValueError
(e.g. except ValueError as err:) and re-raise the HTTPException using "raise
HTTPException(... ) from err" so the original traceback is preserved; keep the
same status_code and detail message and reference get_backend_class, engine_id,
HTTPException, and ValueError when making the change.
- Around line 99-112: The _oom_friendly_reraise helper always raises but lacks a
return-type annotation, causing static analysis to complain in callers like
_run_inference and _run_backend_inference; fix it by importing NoReturn from
typing (or typing_extensions if you prefer) and annotate the function signature
as def _oom_friendly_reraise(e) -> NoReturn: so analyzers know it never returns,
leaving the body unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 607e2bee-3317-4616-b520-dcdc10d8ff69

📥 Commits

Reviewing files that changed from the base of the PR and between 226aeaa and 3c01403.

📒 Files selected for processing (2)
  • backend/api/routers/generation.py
  • tests/test_generate_engine.py

Comment on lines +200 to +220
sr = backend.sample_rate

# Inline [pause Nms] markers (issue #276) work for every engine — the
# silence stitching is model-free.
from omnivoice.utils.text import parse_pause_markers
segments = parse_pause_markers(text)
has_pause = len(segments) > 1 or (segments and segments[0][1] > 0)

if has_pause:
def _gen_span(span_text):
# Per-span duration is left to the engine; an explicit overall
# `duration` can't be meaningfully split across spans.
return backend.generate(span_text, duration=None, **gen_kwargs)
audio_out = _render_with_pauses(_gen_span, segments, sr)
else:
audio_out = backend.generate(text, duration=duration, **gen_kwargs)

return _apply_effect_chain(
audio_out, sr, effect_preset,
skip_mastering=getattr(backend, "applies_own_mastering", False),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if any TTSBackend implementations have lazy sample_rate that could change post-loading
echo "=== Searching for sample_rate property implementations in backends ==="
rg -n -A 10 '`@property`' --type=py | rg -A 10 'sample_rate'

echo ""
echo "=== Checking TTSBackend base class definition ==="
ast-grep --pattern $'class TTSBackend:
  $$$
  sample_rate
  $$$
'

Repository: debpalash/OmniVoice-Studio

Length of output: 16563


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Locate _run_backend_inference in generation router ==="
rg -n "_run_backend_inference" backend/api/routers/generation.py

echo
echo "=== Show generation.py around _run_backend_inference ==="
python3 - <<'PY'
import itertools,sys,os,re
path="backend/api/routers/generation.py"
with open(path,'r',encoding='utf-8') as f:
    lines=f.readlines()
for i,l in enumerate(lines, start=1):
    if "_run_backend_inference" in l:
        start=max(1,i-80)
        end=min(len(lines), i+140)
        for j in range(start,end+1):
            print(f"{j:5d}:{lines[j-1].rstrip()}")
        break
else:
    print("Not found")
PY

echo
echo "=== Show _render_with_pauses definition and where sr used ==="
rg -n "_render_with_pauses" backend/api/routers/generation.py
python3 - <<'PY'
path="backend/api/routers/generation.py"
with open(path,'r',encoding='utf-8') as f:
    lines=f.readlines()
idx=[i for i,l in enumerate(lines, start=1) if "_render_with_pauses" in l]
if not idx:
    print("No occurrences")
    sys.exit(0)
for i in idx[:3]:
    start=max(1,i-40); end=min(len(lines), i+120)
    print("\n--- around line",i,"---")
    for j in range(start,end+1):
        print(f"{j:5d}:{lines[j-1].rstrip()}")
PY

echo
echo "=== Search where sample_rate is read in generation.py ==="
rg -n "sample_rate" backend/api/routers/generation.py

Repository: debpalash/OmniVoice-Studio

Length of output: 33607


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Locate backend classes with conditional sample_rate ==="
rg -n "class (OmniVoiceBackend|CosyVoiceBackend|Sherpa|SherpaONNX|Vits|TTSBackend)" backend/services/tts_backend.py

echo
echo "=== OmniVoiceBackend: sample_rate + _ensure_loaded + generate ==="
python3 - <<'PY'
import re
path="backend/services/tts_backend.py"
with open(path,'r',encoding='utf-8') as f: lines=f.readlines()

targets=["class OmniVoiceBackend","class CosyVoiceBackend","class SherpaONNX","class Sherpa-ONNX","class Sherpa"]
for t in targets:
    for i,l in enumerate(lines):
        if t in l:
            start=max(1,i-5); end=min(len(lines), i+140)
            print(f"\n--- {t} around line {i+1} ---")
            for j in range(start,end+1):
                print(f"{j:5d}:{lines[j-1].rstrip()}")
            break

PY

echo
echo "=== Search for _ensure_loaded calls inside generate implementations ==="
rg -n "def generate\\(|_ensure_loaded\\(" backend/services/tts_backend.py | head -n 200

echo
echo "=== Show base adapter protocol (TTSBackend) for generate/sample_rate coupling ==="
rg -n "class TTSBackend|def generate\\(" backend/services/tts_backend.py

Repository: debpalash/OmniVoice-Studio

Length of output: 21343


🏁 Script executed:

#!/bin/bash
set -euo pipefail
python3 - <<'PY'
path="backend/api/routers/generation.py"
with open(path,'r',encoding='utf-8') as f: lines=f.readlines()
# show around where sample_rate assigned after inference (reported around 360)
for center in [330,350,360,365,370,380,390]:
    start=max(1,center-20); end=min(len(lines),center+20)
    print(f"\n===== around line {center} =====")
    for i in range(start,end+1):
        print(f"{i:5d}:{lines[i-1].rstrip()}")
PY

Repository: debpalash/OmniVoice-Studio

Length of output: 16347


Fix pause-marker silence timing when backend sample_rate is lazy-loaded

backend/api/routers/generation.py _run_backend_inference captures sr = backend.sample_rate before calling backend.generate(), and _render_with_pauses uses that sr to compute silence samples. However, generate_speech re-reads sample_rate = _backend.sample_rate after generation, and lazy backends change their reported rate once weights are loaded (e.g., OmniVoiceBackend.sample_rate is 24000 until _model is loaded; CosyVoiceBackend similarly; SherpaOnnxBackend is 22050 until _tts is loaded).

This can make pause-marker stitching durations wrong on the first request after lazy model load, even though the final WAV saving uses the updated sample_rate.

Consider either:

  1. Ensuring the backend is loaded (e.g., force _ensure_loaded() or equivalent) before capturing sr for _render_with_pauses, or
  2. Capturing/deriving sr only after the backend has loaded (e.g., after the first span is generated) and using that same sr for stitching and for saving/watermarking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 200 - 220, The pause-marker
stitching uses sr = backend.sample_rate before the backend may lazily load and
change its sample_rate, causing incorrect silence durations; fix by ensuring the
backend is fully loaded before capturing sr: call the backend's load routine
(e.g., backend._ensure_loaded() or the appropriate ensure/load method) at the
start of _run_backend_inference (before sr = backend.sample_rate and before any
generate calls) so _render_with_pauses and subsequent saving/watermarking use
the same, final sample_rate; alternatively, if no ensure method exists, defer
computing sr until after the first span generation (use the sample_rate reported
after backend.generate in _gen_span) and pass that same sr into
_render_with_pauses and into _apply_effect_chain.

…full-suite isolation

tests/backend/** runs before tests/test_*.py and pollutes sys.modules
(re-imports the services tree), so module-level imports bound at pytest
collection pointed at a stale services.tts_backend — registry patches
landed on a dict the routes no longer read ('Unknown TTS engine' in CI).
Modules are now resolved through sys.modules inside each test. The client
fixture also drops the module-scoped lifespan context manager that bound
event_bus queues to this module's loop (teardown 'Queue bound to a
different event loop') — plain function-scoped TestClient, the
test_api.py pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@debpalash

Copy link
Copy Markdown
Owner Author

Thanks — the approach looks right (resolution order matches /ws/tts and /v1/audio/speech, and keeping the OmniVoice default path byte-identical is the right call), but CI is red and the failures are in this PR's own new tests:

  • All 4 engine-stub tests in tests/test_generate_engine.py fail with 400 — Unknown TTS engine: 'fake-engine': the stubbed engine isn't registered when the full suite runs (works in isolation per the PR body, fails in CI). Looks like the engine registry the stub patches isn't the one the route reads at request time — or another test's setup/teardown resets the registry. Worth registering the stub through the same registry the route consults, via a fixture that re-applies per test.
  • One teardown error: RuntimeError: <Queue> is bound to a different event loop in test_generate_respects_applies_own_mastering — suggests the test creates async state across event loops; the patterns in test_api.py (TestClient-per-test) avoid this.

CI run: https://github.com/debpalash/OmniVoice-Studio/actions/runs/27321203190

Happy to re-review once the suite is green.

@debpalash debpalash merged commit 433f1ba into main Jun 11, 2026
15 checks passed
debpalash pushed a commit that referenced this pull request Jun 11, 2026
…ith #324 engine routing

PR #324 (engine resolution in /generate) landed on main and refactored the
shared OOM error path in backend/api/routers/generation.py into the
_oom_friendly_reraise() helper, and added a _run_backend_inference() twin for
non-default engines. PR #278's compile-failure detection had been inlined in
_run_inference's `except Exception` block, so the two collided at the same
call site.

Resolution: fold #278's _is_compile_runtime_failure() check into the shared
_oom_friendly_reraise() helper rather than the now-removed inline block. This
keeps both features:
- The OmniVoice native path (_run_inference) still surfaces the
  torch.compile/Triton-specific message instead of mislabeling it as OOM —
  #278's intent preserved.
- #324's _run_backend_inference (non-default engines) is left unchanged and
  now also benefits from the same detection (a no-op for non-compile errors,
  and torch.compile failures aren't OmniVoice-specific anyway).
- The actual compile-fallback wrapper (_install_compile_fallback /
  _is_compile_runtime_failure in model_manager.py, plus the arch gate and
  OMNIVOICE_FORCE_TORCH_COMPILE override in engine_env.py) merged cleanly and
  is untouched — it still wraps get_model()'s generate on the native path.

Validation (targeted): tests/test_generate_engine.py,
tests/test_compile_fallback.py, tests/test_torch_compile_gate.py — 27 passed.
`python -c "import api.routers.generation"` — OK.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/generate ignores the selected TTS engine (always uses OmniVoice) — intended?

2 participants