fix(tts): /generate honors the selected TTS engine (#312) by debpalash · Pull Request #324 · debpalash/OmniVoice-Studio

debpalash · 2026-06-11T03:10:22Z

Summary

POST /generate always ran the OmniVoice model directly via get_model(), silently ignoring the engine selected in Settings (and answering issue #312's question: no, it wasn't intended). It now:

resolves the active backend per request: explicit engine form field > OMNIVOICE_TTS_BACKEND env > Settings selection > OmniVoice default — the same pattern /ws/tts (engine) and /v1/audio/speech (model) already use
keeps the OmniVoice default path byte-identical (native path with the full advanced parameter surface: t_shift, layer/position/class controls) — existing API consumers see zero change
reuses the per-process engine instance cache shared with the health-check route, so weights load once
keeps inline [pause Nms] markers ([Feature] Pause in Transcript #276) working on every engine (silence stitching is model-free)
honors applies_own_mastering so studio engines skip the broadcast mastering chain (closing the TODO left by 5d602c8), while loudness normalization still applies
unknown/unavailable engines return an actionable 400 listing valid ids

Fixes #312

Tests

tests/test_generate_engine.py (new, 6 tests, engine layer stubbed as in test_api.py): Settings-selected engine generates; per-request engine overrides; unknown → 400; unavailable → 400; default path unchanged; own-mastering engines skip the chain. Full file passes locally; CI runs the complete suite.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Per-request TTS engine override and engine-aware inference path.
- Language normalization (e.g., "auto" → undetermined) and consistent sample-rate handling.
- Shared audio effect chain with optional raw passthrough and conditional mastering when engines apply their own mastering.
Bug Fixes
- Clearer errors for unknown or unavailable engines and improved out-of-memory error handling.
Tests
- Added tests covering engine selection, overrides, availability errors, and mastering behavior.

The /generate route always ran the OmniVoice model directly, ignoring both the Settings engine selection and any per-request override. It now resolves the active backend (env var > Settings selection > default), supports an explicit `engine` form field (same pattern as /ws/tts and /v1/audio/speech), reuses the per-process engine instance cache, keeps inline [pause Nms] markers working on every engine, and honors applies_own_mastering so studio engines skip the broadcast mastering chain. The OmniVoice default path is byte-identical to the old behavior — existing API consumers see no change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

coderabbitai · 2026-06-11T03:10:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b4313ba3-f99c-4892-a158-e56b81762e9b

📥 Commits

Reviewing files that changed from the base of the PR and between 3c01403 and bffe12b.

📒 Files selected for processing (1)

tests/test_generate_engine.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_generate_engine.py

📝 Walkthrough

Walkthrough

Extended POST /generate to support per-request TTS engine selection via an optional engine parameter. Refactored audio post-processing into reusable _apply_effect_chain() and _oom_friendly_reraise() helpers. Implemented engine-aware inference path that routes requests through pluggable backend adapters while preserving OmniVoice as the default. Updated output handling to use the resolved backend or model sample rate consistently. Added comprehensive test coverage validating engine selection, error handling, backward compatibility, and mastering-skip behavior.

Changes

Engine-aware /generate endpoint with refactored audio processing

Layer / File(s)	Summary
Audio DSP refactoring and error handling `backend/api/routers/generation.py`	Extracted post-processing into `_apply_effect_chain()` that validates effect presets, supports raw passthrough, conditionally applies mastering, and normalizes loudness. Introduced `_oom_friendly_reraise()` for standardized out-of-memory error handling with cache cleanup. Updated `_run_inference` to delegate DSP logic to the shared effect chain.
Backend-aware inference implementation `backend/api/routers/generation.py`	Added `_run_backend_inference()` to execute generation through pluggable backend adapters: sets seed, normalizes "auto" language to `None`, parses pause markers to stitch span audio, generates via the backend, and applies DSP effects with mastering conditionally skipped based on backend's `applies_own_mastering` flag.
Engine selection and resolution in /generate `backend/api/routers/generation.py`	Extended endpoint to accept optional `engine` form field. Implemented resolution logic that determines engine ID from parameter or settings default, validates against backend registry (HTTP 400 for unknown engines), checks availability (HTTP 400 with reason if unavailable), and loads either native OmniVoice model or cached backend instance.
Generation execution branching and output normalization `backend/api/routers/generation.py`	Updated main execution to conditionally route: runs `_run_backend_inference()` for adapters or `_run_inference()` for OmniVoice. Resolved sample rate from backend or model. Updated watermark embedding, file saving, and WAV serialization to use resolved sample rate consistently instead of assuming OmniVoice's fixed rate.
Test fixtures and setup `tests/test_generate_engine.py`	Established deterministic test environment with OmniVoice settings and created reusable fixtures: `_make_fake_engine()` helper that dynamically defines stub TTSBackend subclasses with configurable availability and mastering behavior; `client` FastAPI TestClient wrapper; and `no_omnivoice_model` to prevent fallback to OmniVoice path during tests.
Engine selection and override behavior validation `tests/test_generate_engine.py`	Validated that `/generate` honors Settings-selected engine (with "Auto" language passed as `None` to engine), allows per-request `engine` form field to override settings, and rejects unknown engines with HTTP 400 and clear error message.
Error handling, availability validation, and backward compatibility `tests/test_generate_engine.py`	Validated that unavailable engines return HTTP 400 with unavailability reason (engine not invoked), confirmed backward-compatible default behavior still runs OmniVoice native path with x-audio-id header, and verified mastering-skip behavior: `apply_mastering` not called for engines with `applies_own_mastering=True` but called for regular engines.

Sequence Diagram

sequenceDiagram
  participant Client
  participant GenerateEndpoint
  participant Settings
  participant BackendRegistry
  participant OmniVoiceModel
  participant BackendAdapter
  participant AudioDSP

  Client->>GenerateEndpoint: POST /generate (optional engine)
  GenerateEndpoint->>Settings: Resolve engine ID (or use default)
  GenerateEndpoint->>BackendRegistry: Validate engine exists
  alt Unknown engine
    GenerateEndpoint-->>Client: HTTP 400 (unknown)
  else
    GenerateEndpoint->>BackendRegistry: Check engine availability
    alt Unavailable engine
      GenerateEndpoint-->>Client: HTTP 400 (unavailable)
    else
      alt Engine is OmniVoice (or default)
        GenerateEndpoint->>OmniVoiceModel: Load model
        OmniVoiceModel->>OmniVoiceModel: Generate audio
      else
        GenerateEndpoint->>BackendAdapter: Load cached instance
        BackendAdapter->>BackendAdapter: Generate audio
      end
      GenerateEndpoint->>AudioDSP: Apply effect chain (skip mastering if backend applies it)
      GenerateEndpoint->>GenerateEndpoint: Watermark + serialize with resolved sample_rate
      GenerateEndpoint-->>Client: HTTP 200 (audio WAV)
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

debpalash/OmniVoice-Studio#311: Implements the applies_own_mastering backend flag that the main PR uses to conditionally skip the shared broadcast DSP mastering chain.
debpalash/OmniVoice-Studio#109: Extends the generation pipeline with effect_preset and raw passthrough options that the main PR refactors into the shared _apply_effect_chain() helper.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: /generate endpoint now honors the selected TTS engine, directly addressing the issue `#312`.
Description check	✅ Passed	The description covers the Summary section well, lists key changes, identifies the type as a bug fix, mentions testing approach, and includes issue reference, though the template checklist itself is not filled in.
Linked Issues check	✅ Passed	The PR fully addresses issue `#312` by implementing engine resolution with proper precedence, maintaining backward compatibility, and adding comprehensive tests for engine selection behavior.
Out of Scope Changes check	✅ Passed	All changes are scoped to resolving engine selection in /generate and related test infrastructure; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/312-generate-engine

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

+def _run_backend_inference(
+    backend, text, language, ref_audio_path, ref_text, instruct, duration,
+    num_step, guidance_scale, speed, denoise, postprocess_output,
+    used_seed, effect_preset="broadcast",
+):


greptile-apps · 2026-06-11T03:16:02Z

Greptile Summary

This PR fixes issue #312 where POST /generate always routed to the OmniVoice model regardless of the engine selected in Settings. The engine resolution now follows the same priority chain (engine form field → OMNIVOICE_TTS_BACKEND env → Settings selection → OmniVoice default) used by /ws/tts and /v1/audio/speech, while keeping the OmniVoice native path byte-identical for existing API consumers.

_apply_effect_chain and _oom_friendly_reraise are extracted as shared helpers; a new _run_backend_inference function handles the pluggable-engine path, honoring applies_own_mastering to skip broadcast mastering for studio engines.
Two issues flagged in prior review threads remain open: sr = backend.sample_rate is read before generation in _run_backend_inference (could produce wrong pause timing and DSP calibration for lazy-loading backends), and _oom_friendly_reraise raises without from e, dropping the original traceback.
Six new tests cover engine selection, per-request override, unknown/unavailable 400s, the unchanged OmniVoice default path, and mastering chain gating.

Confidence Score: 4/5

Safe to merge for the primary goal of engine routing; two known deficiencies in _run_backend_inference (sample rate read order, exception chain) are carried forward from prior review threads and should be addressed as follow-ups.

The engine routing logic is well-structured and mirrors the existing pattern on the other two TTS endpoints. The OmniVoice native path is provably unchanged. The two open items from prior threads are real defects on the backend inference path but scoped to non-OmniVoice engines.

backend/api/routers/generation.py — specifically _run_backend_inference's early sample_rate read (line 200) and _oom_friendly_reraise's exception chain (line 109), both flagged in prior review threads.

Important Files Changed

Filename	Overview
backend/api/routers/generation.py	Adds engine-aware routing to /generate: resolves engine via form field > env var > Settings > OmniVoice default, extracts _apply_effect_chain and _oom_friendly_reraise helpers, and introduces _run_backend_inference for non-OmniVoice paths. Two open issues (sample rate read before lazy load, bare RuntimeError raise) were flagged in prior review threads.
tests/test_generate_engine.py	New test module covering 6 cases: Settings-selected engine runs, per-request override, unknown/unavailable 400s, OmniVoice default path unchanged, applies_own_mastering skips mastering chain. Uses fresh stub classes per call and run-time module resolution to avoid fixture pollution; instance-cache isolation depends on _get_engine_instance keying by class identity rather than engine id string.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[POST /generate] --> B{engine form field?}
    B -->|yes| C[engine_id = engine]
    B -->|no| D[engine_id = active_backend_id\nenv > prefs > default]
    C --> E[get_backend_class engine_id]
    D --> E
    E -->|ValueError| F[400 Unknown engine]
    E --> G{backend_cls is\nOmniVoiceBackend?}
    G -->|yes| H[get_model\nnative path]
    G -->|no| I{is_available?}
    I -->|no| J[400 Engine unavailable\nwith masked reason]
    I -->|yes| K[_get_engine_instance\nper-process cache]
    H --> L[run_in_executor\n_run_inference\nfull advanced params]
    K --> M[run_in_executor\n_run_backend_inference\nadapter protocol]
    L --> N[_apply_effect_chain\nskip_mastering=False]
    M --> O{applies_own_mastering?}
    O -->|yes| P[_apply_effect_chain\nskip_mastering=True]
    O -->|no| N
    N --> Q[embed_watermark\nsample_rate after generation]
    P --> Q
    Q --> R[save WAV + DB insert\nStreamingResponse]

_{Reviews (2): Last reviewed commit: "test(312): resolve modules at run time, ..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T03:16:06Z

+            language=language, ref_audio=ref_audio_path, ref_text=ref_text,
+            instruct=instruct, num_step=num_step, guidance_scale=guidance_scale,
+            speed=speed, denoise=denoise, postprocess_output=postprocess_output,
+        )
+        sr = backend.sample_rate
+
+        # Inline [pause Nms] markers (issue #276) work for every engine — the
+        # silence stitching is model-free.
+        from omnivoice.utils.text import parse_pause_markers
+        segments = parse_pause_markers(text)
+        has_pause = len(segments) > 1 or (segments and segments[0][1] > 0)
+
+        if has_pause:
+            def _gen_span(span_text):
+                # Per-span duration is left to the engine; an explicit overall
+                # `duration` can't be meaningfully split across spans.
+                return backend.generate(span_text, duration=None, **gen_kwargs)
+            audio_out = _render_with_pauses(_gen_span, segments, sr)
+        else:
+            audio_out = backend.generate(text, duration=duration, **gen_kwargs)
+
+        return _apply_effect_chain(
+            audio_out, sr, effect_preset,
+            skip_mastering=getattr(backend, "applies_own_mastering", False),
+        )


Sample rate captured before lazy model load

sr = backend.sample_rate is read before any call to backend.generate(). For backends with lazy-loading sample rates — SherpaOnnxBackend returns a hardcoded placeholder 22050 before its ONNX model is loaded, and CosyVoiceBackend/MLXAudioBackend similarly use placeholder values — both the silence-stitching calculation in _render_with_pauses (pause-marker path) and the DSP effects in _apply_effect_chain receive the wrong sample rate on the first request before weights are loaded. A 500 ms pause with a placeholder rate of 22050 but a true model rate of 24000 would produce ~459 ms of silence; more critically, the broadcast mastering chain (compressor, reverb) would be calibrated against the wrong rate.

The generate_speech caller is already aware of this: its comment "Read after generation: engines with lazy model loading report their real rate only once weights are up" is exactly why sample_rate = _backend.sample_rate is read after the executor returns. The same deferred read needs to happen inside _run_backend_inference — at minimum, move sr = backend.sample_rate to after the audio_out = backend.generate(...) call on the non-pause path so _apply_effect_chain always gets the true rate.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

backend/api/routers/generation.py (2)

262-271: ⚡ Quick win

Preserve exception chain for better debugging.

The HTTPException should be raised with from err to preserve the exception chain, aiding debugging when an unknown engine is provided.

Proposed fix

-    except ValueError:
+    except ValueError as err:
         raise HTTPException(
             status_code=400,
             detail=(
                 f"Unknown TTS engine: {engine_id!r}. "
                 "See GET /engines/tts for the list of valid engine ids."
             ),
-        )
+        ) from err

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 262 - 271, The code catches
ValueError from get_backend_class(engine_id) but re-raises HTTPException without
preserving the original exception chain; update the except block to capture the
ValueError (e.g. except ValueError as err:) and re-raise the HTTPException using
"raise HTTPException(... ) from err" so the original traceback is preserved;
keep the same status_code and detail message and reference get_backend_class,
engine_id, HTTPException, and ValueError when making the change.

99-112: ⚡ Quick win

Add NoReturn type hint to silence static analysis warning.

This function always raises but lacks a type annotation. Static analysis tools (CodeQL) flag implicit returns in callers (_run_inference, _run_backend_inference) because they can't infer that this function never returns.

Proposed fix

+from typing import NoReturn
+
-def _oom_friendly_reraise(e):
+def _oom_friendly_reraise(e: Exception) -> NoReturn:
     """Best-effort cache flush + the user-facing OOM hint shared by both
     inference paths."""

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 99 - 112, The
_oom_friendly_reraise helper always raises but lacks a return-type annotation,
causing static analysis to complain in callers like _run_inference and
_run_backend_inference; fix it by importing NoReturn from typing (or
typing_extensions if you prefer) and annotate the function signature as def
_oom_friendly_reraise(e) -> NoReturn: so analyzers know it never returns,
leaving the body unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/api/routers/generation.py`:
- Around line 200-220: The pause-marker stitching uses sr = backend.sample_rate
before the backend may lazily load and change its sample_rate, causing incorrect
silence durations; fix by ensuring the backend is fully loaded before capturing
sr: call the backend's load routine (e.g., backend._ensure_loaded() or the
appropriate ensure/load method) at the start of _run_backend_inference (before
sr = backend.sample_rate and before any generate calls) so _render_with_pauses
and subsequent saving/watermarking use the same, final sample_rate;
alternatively, if no ensure method exists, defer computing sr until after the
first span generation (use the sample_rate reported after backend.generate in
_gen_span) and pass that same sr into _render_with_pauses and into
_apply_effect_chain.

---

Nitpick comments:
In `@backend/api/routers/generation.py`:
- Around line 262-271: The code catches ValueError from
get_backend_class(engine_id) but re-raises HTTPException without preserving the
original exception chain; update the except block to capture the ValueError
(e.g. except ValueError as err:) and re-raise the HTTPException using "raise
HTTPException(... ) from err" so the original traceback is preserved; keep the
same status_code and detail message and reference get_backend_class, engine_id,
HTTPException, and ValueError when making the change.
- Around line 99-112: The _oom_friendly_reraise helper always raises but lacks a
return-type annotation, causing static analysis to complain in callers like
_run_inference and _run_backend_inference; fix it by importing NoReturn from
typing (or typing_extensions if you prefer) and annotate the function signature
as def _oom_friendly_reraise(e) -> NoReturn: so analyzers know it never returns,
leaving the body unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 607e2bee-3317-4616-b520-dcdc10d8ff69

📥 Commits

Reviewing files that changed from the base of the PR and between 226aeaa and 3c01403.

📒 Files selected for processing (2)

backend/api/routers/generation.py
tests/test_generate_engine.py

coderabbitai · 2026-06-11T03:19:11Z

+        sr = backend.sample_rate
+
+        # Inline [pause Nms] markers (issue #276) work for every engine — the
+        # silence stitching is model-free.
+        from omnivoice.utils.text import parse_pause_markers
+        segments = parse_pause_markers(text)
+        has_pause = len(segments) > 1 or (segments and segments[0][1] > 0)
+
+        if has_pause:
+            def _gen_span(span_text):
+                # Per-span duration is left to the engine; an explicit overall
+                # `duration` can't be meaningfully split across spans.
+                return backend.generate(span_text, duration=None, **gen_kwargs)
+            audio_out = _render_with_pauses(_gen_span, segments, sr)
+        else:
+            audio_out = backend.generate(text, duration=duration, **gen_kwargs)
+
+        return _apply_effect_chain(
+            audio_out, sr, effect_preset,
+            skip_mastering=getattr(backend, "applies_own_mastering", False),
+        )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if any TTSBackend implementations have lazy sample_rate that could change post-loading echo "=== Searching for sample_rate property implementations in backends ===" rg -n -A 10 '`@property`' --type=py | rg -A 10 'sample_rate' echo "" echo "=== Checking TTSBackend base class definition ===" ast-grep --pattern $'class TTSBackend: $$$ sample_rate $$$ '

Repository: debpalash/OmniVoice-Studio

Length of output: 16563

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Locate _run_backend_inference in generation router ===" rg -n "_run_backend_inference" backend/api/routers/generation.py echo echo "=== Show generation.py around _run_backend_inference ===" python3 - <<'PY' import itertools,sys,os,re path="backend/api/routers/generation.py" with open(path,'r',encoding='utf-8') as f: lines=f.readlines() for i,l in enumerate(lines, start=1): if "_run_backend_inference" in l: start=max(1,i-80) end=min(len(lines), i+140) for j in range(start,end+1): print(f"{j:5d}:{lines[j-1].rstrip()}") break else: print("Not found") PY echo echo "=== Show _render_with_pauses definition and where sr used ===" rg -n "_render_with_pauses" backend/api/routers/generation.py python3 - <<'PY' path="backend/api/routers/generation.py" with open(path,'r',encoding='utf-8') as f: lines=f.readlines() idx=[i for i,l in enumerate(lines, start=1) if "_render_with_pauses" in l] if not idx: print("No occurrences") sys.exit(0) for i in idx[:3]: start=max(1,i-40); end=min(len(lines), i+120) print("\n--- around line",i,"---") for j in range(start,end+1): print(f"{j:5d}:{lines[j-1].rstrip()}") PY echo echo "=== Search where sample_rate is read in generation.py ===" rg -n "sample_rate" backend/api/routers/generation.py

Repository: debpalash/OmniVoice-Studio

Length of output: 33607

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Locate backend classes with conditional sample_rate ===" rg -n "class (OmniVoiceBackend|CosyVoiceBackend|Sherpa|SherpaONNX|Vits|TTSBackend)" backend/services/tts_backend.py echo echo "=== OmniVoiceBackend: sample_rate + _ensure_loaded + generate ===" python3 - <<'PY' import re path="backend/services/tts_backend.py" with open(path,'r',encoding='utf-8') as f: lines=f.readlines() targets=["class OmniVoiceBackend","class CosyVoiceBackend","class SherpaONNX","class Sherpa-ONNX","class Sherpa"] for t in targets: for i,l in enumerate(lines): if t in l: start=max(1,i-5); end=min(len(lines), i+140) print(f"\n--- {t} around line {i+1} ---") for j in range(start,end+1): print(f"{j:5d}:{lines[j-1].rstrip()}") break PY echo echo "=== Search for _ensure_loaded calls inside generate implementations ===" rg -n "def generate\\(|_ensure_loaded\\(" backend/services/tts_backend.py | head -n 200 echo echo "=== Show base adapter protocol (TTSBackend) for generate/sample_rate coupling ===" rg -n "class TTSBackend|def generate\\(" backend/services/tts_backend.py

Repository: debpalash/OmniVoice-Studio

Length of output: 21343

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' path="backend/api/routers/generation.py" with open(path,'r',encoding='utf-8') as f: lines=f.readlines() # show around where sample_rate assigned after inference (reported around 360) for center in [330,350,360,365,370,380,390]: start=max(1,center-20); end=min(len(lines),center+20) print(f"\n===== around line {center} =====") for i in range(start,end+1): print(f"{i:5d}:{lines[i-1].rstrip()}") PY

Repository: debpalash/OmniVoice-Studio

Length of output: 16347

Fix pause-marker silence timing when backend sample_rate is lazy-loaded

backend/api/routers/generation.py _run_backend_inference captures sr = backend.sample_rate before calling backend.generate(), and _render_with_pauses uses that sr to compute silence samples. However, generate_speech re-reads sample_rate = _backend.sample_rate after generation, and lazy backends change their reported rate once weights are loaded (e.g., OmniVoiceBackend.sample_rate is 24000 until _model is loaded; CosyVoiceBackend similarly; SherpaOnnxBackend is 22050 until _tts is loaded).

This can make pause-marker stitching durations wrong on the first request after lazy model load, even though the final WAV saving uses the updated sample_rate.

Consider either:

Ensuring the backend is loaded (e.g., force _ensure_loaded() or equivalent) before capturing sr for _render_with_pauses, or

Capturing/deriving sr only after the backend has loaded (e.g., after the first span is generated) and using that same sr for stitching and for saving/watermarking.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/api/routers/generation.py` around lines 200 - 220, The pause-marker stitching uses sr = backend.sample_rate before the backend may lazily load and change its sample_rate, causing incorrect silence durations; fix by ensuring the backend is fully loaded before capturing sr: call the backend's load routine (e.g., backend._ensure_loaded() or the appropriate ensure/load method) at the start of _run_backend_inference (before sr = backend.sample_rate and before any generate calls) so _render_with_pauses and subsequent saving/watermarking use the same, final sample_rate; alternatively, if no ensure method exists, defer computing sr until after the first span generation (use the sample_rate reported after backend.generate in _gen_span) and pass that same sr into _render_with_pauses and into _apply_effect_chain.

…full-suite isolation tests/backend/** runs before tests/test_*.py and pollutes sys.modules (re-imports the services tree), so module-level imports bound at pytest collection pointed at a stale services.tts_backend — registry patches landed on a dict the routes no longer read ('Unknown TTS engine' in CI). Modules are now resolved through sys.modules inside each test. The client fixture also drops the module-scoped lifespan context manager that bound event_bus queues to this module's loop (teardown 'Queue bound to a different event loop') — plain function-scoped TestClient, the test_api.py pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

debpalash · 2026-06-11T06:40:46Z

Thanks — the approach looks right (resolution order matches /ws/tts and /v1/audio/speech, and keeping the OmniVoice default path byte-identical is the right call), but CI is red and the failures are in this PR's own new tests:

All 4 engine-stub tests in tests/test_generate_engine.py fail with 400 — Unknown TTS engine: 'fake-engine': the stubbed engine isn't registered when the full suite runs (works in isolation per the PR body, fails in CI). Looks like the engine registry the stub patches isn't the one the route reads at request time — or another test's setup/teardown resets the registry. Worth registering the stub through the same registry the route consults, via a fixture that re-applies per test.
One teardown error: RuntimeError: <Queue> is bound to a different event loop in test_generate_respects_applies_own_mastering — suggests the test creates async state across event loops; the patterns in test_api.py (TestClient-per-test) avoid this.

CI run: https://github.com/debpalash/OmniVoice-Studio/actions/runs/27321203190

Happy to re-review once the suite is green.

…ith #324 engine routing PR #324 (engine resolution in /generate) landed on main and refactored the shared OOM error path in backend/api/routers/generation.py into the _oom_friendly_reraise() helper, and added a _run_backend_inference() twin for non-default engines. PR #278's compile-failure detection had been inlined in _run_inference's `except Exception` block, so the two collided at the same call site. Resolution: fold #278's _is_compile_runtime_failure() check into the shared _oom_friendly_reraise() helper rather than the now-removed inline block. This keeps both features: - The OmniVoice native path (_run_inference) still surfaces the torch.compile/Triton-specific message instead of mislabeling it as OOM — #278's intent preserved. - #324's _run_backend_inference (non-default engines) is left unchanged and now also benefits from the same detection (a no-op for non-compile errors, and torch.compile failures aren't OmniVoice-specific anyway). - The actual compile-fallback wrapper (_install_compile_fallback / _is_compile_runtime_failure in model_manager.py, plus the arch gate and OMNIVOICE_FORCE_TORCH_COMPILE override in engine_env.py) merged cleanly and is untouched — it still wraps get_model()'s generate on the native path. Validation (targeted): tests/test_generate_engine.py, tests/test_compile_fallback.py, tests/test_torch_compile_gate.py — 27 passed. `python -c "import api.routers.generation"` — OK. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread backend/api/routers/generation.py

Comment on lines +174 to +178

def _run_backend_inference(

backend, text, language, ref_audio_path, ref_text, instruct, duration,

num_step, guidance_scale, speed, denoise, postprocess_output,

used_seed, effect_preset="broadcast",

):

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

debpalash merged commit 433f1ba into main Jun 11, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): /generate honors the selected TTS engine (#312)#324

fix(tts): /generate honors the selected TTS engine (#312)#324
debpalash merged 2 commits into
mainfrom
fix/312-generate-engine

debpalash commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Greptile Summary

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

debpalash commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

debpalash commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

debpalash commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

debpalash commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading