Add per-segment audio effects DSP preset selector to dub pipeline by 4shil · Pull Request #68 · debpalash/OmniVoice-Studio

4shil · 2026-05-16T17:42:16Z

Summary

Adds a per-segment audio effects DSP preset selector to the dub pipeline. Users can now choose from 6 broadcast-grade DSP presets (broadcast, cinematic, podcast, warm, bright, raw) that are applied to TTS-generated audio before mixing.

Changes

New GET /engines/effects/presets endpoint exposing existing EFFECT_PRESETS
DubSegment schema now accepts effect_preset field (default: "broadcast")
Dub generation hot loop applies apply_effects_chain() per segment
Batch TTS path supports per-segment effect presets
Standalone /generate endpoint accepts effect_preset form field
Frontend: effect presets loaded from API, stored in dubSlice (state only, no UI in this PR)
Segment fingerprint (effect_preset in _GEN_INPUT_FIELDS) updated so preset changes trigger regen
12 new unit tests for effects chain (pure functions, no GPU needed)

Type

New feature

Testing

Unit tests added (tests/test_effects_chain.py, 12 test cases)
Full test suite passes (tests/ + backend/tests/)
Frontend typecheck passes

Edge Cases

pedalboard not installed: apply_effects_chain() returns audio unmodified (existing graceful degradation)
Invalid preset ID: Pydantic validator on DubSegment returns 422
"raw" preset: Empty chain, processing skipped for efficiency
Backward compatibility: Defaults to broadcast, existing clients unaffected

Summary by CodeRabbit

New Features
- Per-segment audio effect presets for dubbing (broadcast, cinematic, podcast, warm, bright, raw); "raw" returns un‑mastered output.
- Endpoint and frontend support to list/select presets; presets are submitted with generation requests.
Behavior
- Preset selection affects generation and caching—changing a segment's preset will trigger regenerated audio.
Tests
- Added tests validating preset listings, effect chains, DSP behavior, and clipping prevention.

coderabbitai · 2026-05-16T17:42:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds per-segment DSP effect-preset support: schemas and validation, discovery endpoint, frontend types/store, and runtime plumbing to apply presets (including raw bypass) during generation and batched processing; fingerprints now include effect_preset; DSP tests added.

Changes

Per-Segment Audio Effect Presets

Layer / File(s)	Summary
Effect preset data models and validation `backend/api/schemas.py`, `backend/schemas/requests.py`, `frontend/src/api/types.ts`	`EffectPresetEntry` and `EffectPresetsResponse` schemas added; `DubSegment` gains `effect_preset` plus a validator enforcing values from `EFFECT_PRESETS`.
Effect preset discovery API `backend/api/routers/engines.py`, `frontend/src/api/engines.ts`	New `GET /engines/effects/presets` endpoint returns available presets; frontend `fetchEffectPresets()` fetches and types the response.
Effect application in generation pipeline `backend/api/routers/generation.py`	`/generate` and `_run_inference()` now accept `effect_preset` (default `"broadcast"`) and conditionally apply an effect chain after mastering unless preset is `"raw"`.
Effect application in dub generation `backend/api/routers/dub_generate.py`	Per-segment dub generation computes sample rate defensively, applies per-segment effect chains post-mastering (skip when `"raw"`), narrows OOM exception handling, and threads `effect_preset` into segment fingerprints.
Effect application in batched processing `backend/services/batched_tts.py`	`SegmentSpec` includes `effect_preset`; batched TTS applies per-segment effect chains between mastering and normalization when preset != `"raw"`.
Incremental cache invalidation `backend/services/incremental.py`	Deterministic segment fingerprint inputs include `effect_preset`, causing regeneration when presets change.
Frontend state management `frontend/src/store/dubSlice.ts`	Zustand store extended with `segmentEffectPresets` and `availableEffectPresets` plus setters `setSegmentEffectPreset()` and `setAvailableEffectPresets()`.
DSP effect chain validation `tests/test_effects_chain.py`	Pytest suite validates preset metadata, chain resolution, DSP execution, output shape preservation, clipping prevention for limiter-based presets, and differences from raw.

Sequence Diagram

sequenceDiagram
  participant Client as DubClient
  participant Store as ZustandStore
  participant API as Generation/Dub API
  participant DSP as audio_dsp
  Client->>Store: setSegmentEffectPreset(segId, presetId)
  Client->>API: POST /generate or /dub_generate (effect_preset)
  API->>DSP: list_effect_presets()
  DSP-->>API: presets[]
  API->>DSP: get_effect_chain(effect_preset)
  DSP-->>API: effect_chain (or [])
  API->>DSP: apply_mastering(raw_audio)
  DSP-->>API: mastered_audio
  alt effect_preset != "raw"
    API->>DSP: apply_effects_chain(mastered_audio, chain)
    DSP-->>API: processed_audio
  else raw
    API-->>API: skip effects
  end
  API->>DSP: normalize_audio(processed_or_mastered)
  DSP-->>API: final_audio
  API-->>Client: generated audio

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Feature: Per-segment audio effects DSP preset selector for dub pipeline #67: Implements the same per-segment effect_preset feature across backend and frontend and adds matching DSP tests.

Poem

🐇 I mixed a little sonic stew tonight,

Broadcast, warm, bright, and cinematic light;
Per-segment presets hop and play,
Fingerprints remember the rabbit way,
Now every dub finds its perfect bite.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding per-segment audio effects DSP preset selection to the dub pipeline.
Description check	✅ Passed	The PR description includes all required template sections: clear Summary, detailed Changes, Type checkbox marked, comprehensive Testing details, completed Checklist items, and covers edge cases thoroughly.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_effects_chain.py`:
- Around line 97-114: Add shape assertions to the three tests so they verify the
output preserves input shape like the other tests: in
test_podcast_preset_returns_tensor, test_warm_preset_returns_tensor, and
test_bright_preset_returns_tensor, after calling _make_test_audio(),
get_effect_chain(...), and result = apply_effects_chain(...), add an assertion
that result.shape == audio.shape; reference the test function names and the
helpers _make_test_audio, get_effect_chain, and apply_effects_chain to locate
and update the tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0d41aa1b-e270-4c18-acb1-625829ad4166

📥 Commits

Reviewing files that changed from the base of the PR and between ba4cf6a and 472d1e5.

📒 Files selected for processing (11)

backend/api/routers/dub_generate.py
backend/api/routers/engines.py
backend/api/routers/generation.py
backend/api/schemas.py
backend/schemas/requests.py
backend/services/batched_tts.py
backend/services/incremental.py
frontend/src/api/engines.ts
frontend/src/api/types.ts
frontend/src/store/dubSlice.ts
tests/test_effects_chain.py

Copilot

Pull request overview

Adds end-to-end plumbing for selecting an audio DSP “effect preset” per dub segment (and for standalone generation), exposing available presets via a new backend API route and storing them in frontend state.

Changes:

Backend: expose effect presets via GET /engines/effects/presets, validate DubSegment.effect_preset, and include effect_preset in incremental regen fingerprints.
Backend: apply apply_effects_chain() in both dub generation and batched TTS paths; accept effect_preset on standalone /generate.
Frontend: add types + API client for effect presets and store preset state in dubSlice; add unit tests for effects chain helpers.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`backend/api/routers/engines.py`	Adds `GET /engines/effects/presets` endpoint.
`backend/api/schemas.py`	Introduces response schemas for effect preset listing.
`backend/schemas/requests.py`	Adds `effect_preset` field + validator to `DubSegment`.
`backend/api/routers/dub_generate.py`	Applies per-segment effect chain during dub TTS generation and includes preset in fingerprint input.
`backend/services/batched_tts.py`	Adds `effect_preset` to batched segment spec and applies effect chain per segment.
`backend/api/routers/generation.py`	Accepts `effect_preset` form field and applies effect chain in standalone generation.
`backend/services/incremental.py`	Adds `effect_preset` to segment fingerprint inputs.
`frontend/src/api/engines.ts`	Adds `EffectPreset` types and `fetchEffectPresets()` API call.
`frontend/src/api/types.ts`	Adds `DubSegment` interface including optional `effect_preset`.
`frontend/src/store/dubSlice.ts`	Stores per-segment preset selections + available presets in Zustand slice.
`tests/test_effects_chain.py`	Adds unit tests for preset listing, chain retrieval, and effects application.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                    mastered_audio = apply_mastering(audio_out, sample_rate=sr)
+
+                    # Apply per-segment DSP effect preset (default: broadcast)
+                    seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
+                    if seg_effect_preset != "raw":
+                        effect_chain = get_effect_chain(seg_effect_preset)
+                        if effect_chain:
+                            mastered_audio = apply_effects_chain(
+                                mastered_audio,
+                                sample_rate=sr,
+                                chain=effect_chain,
+                            )
+


                mastered = apply_mastering(audio_out, sample_rate=sr)
+
+                # Apply per-segment DSP effect preset (default: broadcast)
+                seg_effect_preset = getattr(s, "effect_preset", None) or "broadcast"
+                if seg_effect_preset != "raw":
+                    effect_chain = get_effect_chain(seg_effect_preset)
+                    if effect_chain:
+                        mastered = apply_effects_chain(
+                            mastered, sample_rate=sr, chain=effect_chain
+                        )
+


+    effect_preset: str = Form("broadcast"),
 ):
    _model = await get_model()


+_GEN_INPUT_FIELDS = ("text", "target_lang", "profile_id", "instruct", "speed", "direction", "effect_preset")



+import sys
+import os
+
+# Ensure the backend package is importable from tests/
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "backend"))
+


+# Skip all tests if pedalboard is not installed
+pedalboard = pytest.importorskip("pedalboard")
+


+        mastered_audio = apply_mastering(audio_out, sample_rate=sr)
+
+        # Apply DSP effect preset
+        _effect_preset = effect_preset or "broadcast"
+        if _effect_preset != "raw":
+            _chain = get_effect_chain(_effect_preset)
+            if _chain:
+                mastered_audio = apply_effects_chain(
+                    mastered_audio, sample_rate=sr, chain=_chain,
+                )


…duplicate sys.path

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/api/routers/dub_generate.py (1)

154-193: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t remap all _gen failures to “GPU out of memory.”

The new DSP calls (get_effect_chain / apply_effects_chain) are now inside the broad except Exception, so non-OOM failures are surfaced as OOM and become hard to diagnose.

Suggested fix

-                try:
-                    audios = _model.generate(
+                try:
+                    audios = _model.generate(
                         text=text, language=lang if lang != "Auto" else None,
                         ref_audio=ref_audio, ref_text=ref_text,
                         instruct=instruct_str if instruct_str else None,
                         duration=dur_s, num_step=nstep, guidance_scale=cfg,
                         speed=spd, denoise=True, postprocess_output=True,
                     )
-                    audio_out = audios[0]
+                except Exception as e:
+                    import gc
+                    gc.collect()
+                    if torch.backends.mps.is_available():
+                        torch.mps.empty_cache()
+                    elif torch.cuda.is_available():
+                        torch.cuda.empty_cache()
+                    raise RuntimeError(
+                        "Ran out of GPU memory generating this segment. "
+                        "Try the Flush button in the header to free VRAM, or switch to CPU in Settings. "
+                        f"Underlying error: {e}"
+                    )
+
+                try:
+                    audio_out = audios[0]
                     sr = _model.sampling_rate if hasattr(_model, 'sampling_rate') else 24000
                     # ... DSP path ...
                     return normalize_audio(mastered_audio, target_dBFS=-2.0)
                 except Exception:
-                    import gc
-                    gc.collect()
-                    if torch.backends.mps.is_available():
-                        torch.mps.empty_cache()
-                    elif torch.cuda.is_available():
-                        torch.cuda.empty_cache()
-                    raise RuntimeError(
-                        f"Ran out of GPU memory generating this segment. "
-                        f"Try the Flush button in the header to free VRAM, or switch to CPU in Settings. "
-                        f"Underlying error: {e}"
-                    )
+                    raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` around lines 154 - 193, The catch-all
except around the entire block remaps any failure in DSP (get_effect_chain,
apply_effects_chain, apply_mastering, normalize_audio) to a GPU OOM error; fix
by narrowing the try/except to only wrap the model generation step
(_model.generate) and only translate genuine OOM errors (e.g.,
torch.cuda.OutOfMemoryError or RuntimeError with "out of memory") into the
user-facing RuntimeError after clearing caches; perform DSP (apply_mastering,
get_effect_chain, apply_effects_chain, normalize_audio) outside that try so
their exceptions propagate normally, or if you must keep a catch, re-raise
non-OOM exceptions unchanged.

🧹 Nitpick comments (1)

backend/api/routers/dub_generate.py (1)

110-110: ⚡ Quick win

Avoid closing over loop variable seg in executor worker.

_gen captures seg from the outer loop at line 166 (seg.effect_preset), which triggers Ruff B023 and creates fragility if execution patterns change. Pass effect_preset explicitly as a parameter instead.

Suggested fix

-            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None):
+            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"):
@@
-                    seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
+                    seg_effect_preset = effect_preset or "broadcast"
@@
                 audio_tensor = await loop.run_in_executor(
                     _gpu_pool, _gen,
                     seg.text, seg_lang, seg_instruct, seg_duration,
-                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    _num_step, req.guidance_scale, seg_speed, seg_profile, getattr(seg, "effect_preset", None),
                 )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` at line 110, The helper _gen function is
closing over the loop variable seg (used via seg.effect_preset), which causes a
late-binding bug (Ruff B023); modify _gen to accept effect_preset (or preset) as
an explicit parameter and pass seg.effect_preset from the loop when submitting
to the executor, update all calls to _gen to supply this new argument, and
remove any direct references to seg inside _gen so it no longer captures the
outer loop variable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 154-193: The catch-all except around the entire block remaps any
failure in DSP (get_effect_chain, apply_effects_chain, apply_mastering,
normalize_audio) to a GPU OOM error; fix by narrowing the try/except to only
wrap the model generation step (_model.generate) and only translate genuine OOM
errors (e.g., torch.cuda.OutOfMemoryError or RuntimeError with "out of memory")
into the user-facing RuntimeError after clearing caches; perform DSP
(apply_mastering, get_effect_chain, apply_effects_chain, normalize_audio)
outside that try so their exceptions propagate normally, or if you must keep a
catch, re-raise non-OOM exceptions unchanged.

---

Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The helper _gen function is closing over the loop variable seg (used
via seg.effect_preset), which causes a late-binding bug (Ruff B023); modify _gen
to accept effect_preset (or preset) as an explicit parameter and pass
seg.effect_preset from the loop when submitting to the executor, update all
calls to _gen to supply this new argument, and remove any direct references to
seg inside _gen so it no longer captures the outer loop variable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e3be0b46-db8c-46ce-a2c6-8883f8cc3163

📥 Commits

Reviewing files that changed from the base of the PR and between 0c257fe and fd7b908.

📒 Files selected for processing (5)

backend/api/routers/dub_generate.py
backend/api/routers/generation.py
backend/services/batched_tts.py
backend/services/incremental.py
tests/test_effects_chain.py

🚧 Files skipped from review as they are similar to previous changes (4)

backend/services/batched_tts.py
backend/services/incremental.py
backend/api/routers/generation.py
tests/test_effects_chain.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 162-173: The except block that catches torch.cuda.OutOfMemoryError
and torch.mps.MPSError should preserve the original exception context when
rethrowing; change the RuntimeError raise in that except handler (the block
catching torch.cuda.OutOfMemoryError, torch.mps.MPSError in dub_generate.py) to
use exception chaining (raise RuntimeError(... ) from e) so the original
traceback and error details are retained for diagnostics while keeping the same
GC and backend cache clearing logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1869ab5d-c3a9-4a50-83db-0a39567d8d84

📥 Commits

Reviewing files that changed from the base of the PR and between fd7b908 and 943c811.

📒 Files selected for processing (1)

backend/api/routers/dub_generate.py

coderabbitai

🧹 Nitpick comments (1)

backend/api/routers/dub_generate.py (1)

110-110: ⚡ Quick win

Bind effect_preset explicitly instead of capturing loop variable seg.

Line 179 closes over seg inside _gen (defined in the loop), which is exactly the B023 footgun and can apply the wrong preset if execution semantics change. Pass the preset as an explicit _gen argument.

Proposed fix

-            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None):
+            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"):
@@
-                seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
-                if seg_effect_preset == "raw":
+                if effect_preset == "raw":
                     # Raw: skip all DSP — return raw model output
                     return audio_out

                 mastered_audio = apply_mastering(audio_out, sample_rate=sr)
-                effect_chain = get_effect_chain(seg_effect_preset)
+                effect_chain = get_effect_chain(effect_preset)
@@
                 audio_tensor = await loop.run_in_executor(
                     _gpu_pool, _gen,
                     seg.text, seg_lang, seg_instruct, seg_duration,
-                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    (getattr(seg, "effect_preset", None) or "broadcast"),
                 )

Also applies to: 179-190, 229-233

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` at line 110, The closure _gen (defined
as def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None))
captures the loop variable seg and its effect_preset, which risks the B023
loop-variable capture bug; change _gen signature to accept effect_preset (e.g.,
add a parameter effect_preset) and pass the current seg.effect_preset when
calling _gen from inside the loop so the preset is bound explicitly rather than
captured from seg; update all call sites in the loop (and the similar blocks
around lines referenced) to supply the new effect_preset argument.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The closure _gen (defined as def _gen(text, lang, instruct_str,
dur_s, nstep, cfg, spd, profile_id=None)) captures the loop variable seg and its
effect_preset, which risks the B023 loop-variable capture bug; change _gen
signature to accept effect_preset (e.g., add a parameter effect_preset) and pass
the current seg.effect_preset when calling _gen from inside the loop so the
preset is bound explicitly rather than captured from seg; update all call sites
in the loop (and the similar blocks around lines referenced) to supply the new
effect_preset argument.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 413004e7-8dc1-4ce9-b974-40653772f2b0

📥 Commits

Reviewing files that changed from the base of the PR and between 943c811 and 6b325b2.

📒 Files selected for processing (1)

backend/api/routers/dub_generate.py

… OOS deferrals - GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set - INST-01: note PR #62 implements setuptools pin (closes #58) - INST-04: note PR #62 lands README docs for #56 workaround - INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning) - Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir), PR #66 zh-CN (i18n milestone), #63 (empty-template bug) PR #62 is the user's own Wave 1 work landed as a separate PR while GSD planning ran in parallel. Merging it eliminates duplicate work in Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ase smoke (#71) * docs: initialize OmniVoice stabilization milestone project * chore: add project config (yolo + balanced) * docs: domain research for stabilization milestone * docs: define v1 requirements for stabilization milestone * docs: add GGUF + singing engine spike requirements (Phase 4 new) * docs: roadmap revision + CLAUDE.md (7 phases, 62 reqs, +GGUF/SING spikes) * docs(phase-0): add Gates phase RESEARCH.md Phase 0 research synthesizes the cross-platform CI matrix, frozen omnivoice_data fixture, installer post-build smoke, SHA-256 checksum publishing, and PR-template extension into copy-paste-ready YAML and Python snippets composed entirely from existing in-repo patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase-0): add Gates phase CONTEXT, PATTERNS, and PLAN Phase 0 — Gates is the hard pre-condition for v0.3.x stabilization. Lays cross-platform CI matrix (macos-14/windows-2022/ubuntu-22.04), regression fixture (≤200 KB), installer smoke on tag push, SHA-256 checksums in release body + per-OS SHA256SUMS-*.txt assets, PR template with RC cadence + fixture line, and the open-PR landing for #51. Plan covers GATE-01..06; structured into 7 slices (A–G) with explicit Slice C → Slice G dependency reordering so the new smoke-matrix lands on main before PR #51 (CONTEXT.md L86 interleave decision). Plan-checker iteration 2: APPROVED — all 3 BLOCKERs + 3 MAJORs from iteration 1 resolved (file truncation/Slice-G missing, GATE-06 sibling PR verification, Slice C ordering, Truth #5 wording, macOS Tauri WebView avoidance per Pitfall #5, Windows taskkill per Pitfall #2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(00-gates): seed regression fixture (GATE-01) - scripts/seed-test-fixture.py — deterministic builder for tests/fixtures/omnivoice_data/ - wipes + rebuilds; fixed created_at=1700000000.0; all-zero PCM for byte-deterministic diffs - calls backend.core.db.init_db() directly (alembic versions/ is empty — see CONTEXT.md) - checkpoints WAL → DELETE on close so no -shm/-wal sidecars pollute git status - exits non-zero if fixture > 200 KB - tests/fixtures/omnivoice_data/{omnivoice.db, README.md} — 8-table empty DB + 1 voice_profiles row - tests/fixtures/omnivoice_data/voices/test-voice/{profile.json, sample.wav} — 1-sec 24 kHz mono silence - .gitignore — explicit allow-list (!tests/fixtures/omnivoice_data/**) so the existing omnivoice_data/, *.db, *.wav patterns don't hide the fixture from git Verifies: du = 144 KB on disk; sqlite_master lists 8 init_db tables + sqlite_sequence; voice_profiles has exactly 1 row id='test-voice'; 0 rows in generation_history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(00-gates): add tests/smoke/test_boot_smoke.py (GATE-01) - tests/smoke/__init__.py — package marker so pytest treats tests/smoke/ as a module - tests/smoke/test_boot_smoke.py — 4 in-process FastAPI TestClient smoke tests: * test_health_returns_ok — /health returns 200 + {status:ok, device:...} * test_profiles_endpoint_lists_fixture_voice — /profiles surfaces the seeded test-voice row (validates OMNIVOICE_DATA_DIR wiring → DB_PATH → init_db schema) * test_system_info_includes_data_dir — /system/info resolves data_dir * test_history_endpoint_empty — /history reaches DB and returns [] Test isolation env vars (OMNIVOICE_MODEL=test, OMNIVOICE_DISABLE_FILE_LOG=1) set at module top BEFORE any backend import — pattern from tests/test_router_smoke.py. Fixture is copied to a per-session temp dir so the test never mutates the checked-in artifact (SQLite file-change counter + runtime subdirs like dub_jobs/ would otherwise dirty `git status` after every run). Failure mode: if tests/fixtures/omnivoice_data/ is missing, pytest.fail at import time with the regenerate command. - .gitignore — tighten the GATE-01 allow-list to ONLY the seed-produced files (README.md, omnivoice.db, voices/test-voice/profile.json, sample.wav). Prevents future runtime subdirs the backend may create under the fixture from being accidentally committed. Verifies: `uv run pytest tests/smoke/ -q --tb=short` → 4 passed in 1.31 s (target was < 30 s). `git status` clean after a test run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(triage): record post-planning GitHub state — PR #62, new issues, OOS deferrals - GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set - INST-01: note PR #62 implements setuptools pin (closes #58) - INST-04: note PR #62 lands README docs for #56 workaround - INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning) - Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir), PR #66 zh-CN (i18n milestone), #63 (empty-template bug) PR #62 is the user's own Wave 1 work landed as a separate PR while GSD planning ran in parallel. Merging it eliminates duplicate work in Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(00-gates): add cross-platform smoke matrix (GATE-02) - New smoke-matrix job on macos-14, windows-2022, ubuntu-22.04 - needs: test, fail-fast: false, timeout-minutes: 10 - Pinned actions: checkout@v4, setup-python@v5, setup-uv@v3 (cache enabled) - Per-OS ffmpeg + libsndfile install (brew/choco/apt via awalsh128 cache) - UV_HTTP_TIMEOUT=120, UV_HTTP_RETRIES=5 for restricted-network resilience - Narrow scope: uv run pytest tests/smoke/ -q --tb=short - Existing `test` and `tauri-cross-platform` jobs untouched Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: add workflow_dispatch to ci.yml so smoke-matrix can run on feature branches * feat(00-gates): add --health-check CLI flag to backend entrypoint (GATE-03) - argparse on __main__ block; --health-check boots uvicorn in a daemon thread and polls http://127.0.0.1:3900/health every 5s for up to 60s. - Prints 'OK — /health responded 200 after Ns' and exits 0 on first 200. - Prints 'FAIL — /health did not respond 200 within 60s' to stderr and exits 1 on timeout. Default invocation behavior unchanged. - No new deps (stdlib argparse/threading/time/urllib.request/sys + uvicorn). - Consumed by per-OS installer-smoke step in .github/workflows/release.yml. Verified locally: exits 0 in 5s against tests/fixtures/omnivoice_data/. * ci(00-gates): add per-OS installer smoke to release.yml (GATE-03) Adds three matrix-leg-specific steps after 'Build + release (Tauri)', each gated by runner.os with timeout-minutes: 5: - macOS (macos-14): hdiutil attach DMG → locate bundled Python backend inside *.app/Contents (NOT the Tauri WebView shell — RESEARCH Pitfall #5: WebView hangs on headless runners) → invoke --health-check → hdiutil detach. Falls back to *.app/Contents/Resources and hard-fails with a directory listing if no backend binary found. - Windows (windows-2022): msiexec /quiet install → find backend.exe under 'C:/Program Files/OmniVoice Studio' → invoke --health-check in background, wait, then taskkill //F //T //PID to cleanup orphaned PyInstaller child processes on port 3900 (RESEARCH Pitfall #2). - Linux (ubuntu-22.04): --appimage-extract (no FUSE on GH runners), locate binary or AppRun, run under xvfb-run -a. Bundle-only regressions (PyInstaller missing-module, Tauri sidecar path mismatch) are invisible to ci.yml's in-process smoke matrix — this step closes that gap before any release is published. Verified: YAML parses; all three steps present; gating + timeout correct; Pitfall #2/#5 mitigations preserved. * ci(00-gates): publish SHA-256 checksums in release body + as asset (GATE-05) - Add 'Compute SHA-256 checksums' step writing SHA256SUMS-<label>.txt per matrix leg using native shasum/sha256sum (Git Bash on Windows). - Add 'Append checksums to release + attach SHA256SUMS file' step using softprops/action-gh-release@v2 with append_body: true so the hashes land in the release body alongside tauri-action's content (not replacing it) and the file is uploaded as a release asset for 'shasum -c SHA256SUMS-<label>.txt' verification. - Both steps gated by 'github.event_name == push && refs/tags/v*' so workflow_dispatch dry-runs do not attempt to attach to a non-existent release (per CONTEXT.md L70 + RESEARCH Pitfall #7 deferral of any aggregate cross-leg SHA256SUMS job). - fail_on_unmatched_files: true to surface path-resolution errors loudly. * docs(00-gates): document RC cadence + regression-fixture check in PR template (GATE-04) * docs(setup): add HF token persistence guide for macOS/Windows/Linux (DOCS-05) Covers two persistent paths: - Method A — canonical ~/.cache/huggingface/token via huggingface-cli login - Method B — shell env var (~/.zshrc / ~/.bashrc / Windows User scope) Documents the v0.2.7 "session only" in-app behavior + notes that Phase 1 AUTH-03 will make in-app pastes write to the canonical file. Bundled with Phase 0 PR per user request. Strictly DOCS-05 scope — zero code changes, no engine touches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * spec(auth): redesign HF token resolution as 3-source cascade with fallback (AUTH-01..06) Replaces the env_store.py file-based design with a SQLite-backed app store + cascade resolver that checks app → env var → ~/.cache/huggingface/token in priority order, with automatic fallback to next source on HTTP 401. User-explicit design decision: - App-stored token (SQLite settings table, AES-GCM encrypted) wins - Env var ($HF_TOKEN) second - Global huggingface-cli login file third - All three sources visible in Settings → API Keys with "Active" badge - Save action populates BOTH app store AND canonical HF file (defense in depth) New requirement: - AUTH-06 — on 401, auto-retry next source in cascade before erroring Also: traceability count corrected (62 → 74 — undercount at planning + INST-12 + AUTH-06 added post-planning). All 74 v1 reqs mapped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth): backend recognizes HF token from canonical file, not just env var Two call sites were only checking $HF_TOKEN env var, missing the canonical ~/.cache/huggingface/token file written by `huggingface-cli login` (or the app's future Save action): - system.py `/system/info` `has_hf_token` flag — UI showed "No HF token" even when `huggingface-cli login` had populated the file. - model_manager.get_diarization_pipeline — pyannote diarization silently returned None when only the canonical file was set. This is the bug behind issue #35 (speaker diarization setup failure). Both fixes use the same pattern: env var > huggingface_hub.get_token() (which reads the canonical file). Adds a local _has_hf_token() helper to system.py with a comment marking it as prelude to the AUTH-01..06 cascade (Phase 1 token_resolver.py will layer SQLite app-store on top). Closes #35 sub-issue (canonical token invisible to diarization). Cross-cuts AUTH-02 + AUTH-06 design for Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dictation): make pill-widget mode reachable from GUI + scripts (INST-13) The dictation widget infrastructure shipped in PR #40 but was only reachable via the undocumented --pill CLI flag. Adds three discovery paths: 1. Tray menu: "Switch to Dictation Widget" (studio mode) — saves launch_as_widget=true to config, relaunches with --pill, exits current. Mirrors the existing "Open Studio" path in pill-mode tray. 2. Persistent config: AppConfig.launch_as_widget (bool, default false). Read at startup via load_config_pre_app() (uses dirs-next, no AppHandle required). CLI --pill still takes precedence when explicitly passed. 3. Tauri commands: get_launch_as_widget / set_launch_as_widget for the Phase 2 Settings UI to bind a checkbox to. 4. Scripts: bun desktop-prod:pill / desktop-prod:run:pill — forward --pill to the bundled app launch. macOS uses `open -n --args` to spawn fresh instance with the flag. Closes the GUI half of INST-13. Phase 2 closes the Settings UI half. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dictation): show widget unconditionally on pill-mode launch + visible Suspense fallback Before: pill mode set up correctly but the widget window stayed hidden until ⌘⇧Space was pressed. New users saw absolutely nothing on launch (no main window, no dock icon, hidden widget) and assumed the app failed. If global-shortcut Accessibility permission wasn't granted, they had no path to discover the widget at all. Two changes: 1. lib.rs: in pill_mode_setup, explicitly show + position + focus the widget window after hiding main. With per-call error logging so we can diagnose failures (and a clear error log if widget window wasn't created at all — points at tauri.conf.json regression). 2. main-app.jsx: Suspense fallback was `null`, which combined with widget's transparent+decorations:false config made any lazy-import delay or failure invisible. Now renders a dark pill saying "Loading dictation…" so even if CaptureWidget lazy-import stalls, the user sees the window exists. Studio mode behavior unchanged — widget stays hidden until hotkey or tray click triggers it (existing show() call in the shortcut/ menu handlers is preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dictation): create widget window programmatically; Tauri 2 silently dropped config-array creation Root cause: declaring the widget window in tauri.conf.json's app.windows[] silently failed in Tauri 2 — get_webview_window("widget") returned None even though the config was syntactically valid. Probable culprit was the transparent + decorations:false + visible:false combo, but Tauri offered no error message either at startup or via webview_windows() enumeration. Diagnosed by adding webview_windows() enumeration logging at setup start (only ["main"] ever appeared) and a programmatic WebviewWindowBuilder fallback that surfaces real Result errors. Fix: - tauri.conf.json: widget entry now has `create: false` to make the config-vs-programmatic handoff explicit. - lib.rs setup(): call WebviewWindowBuilder::new(app, "widget", ...).build() with the exact same surface attributes the config used to declare. - capabilities/default.json: include "widget" in windows array so the new window inherits the same Tauri permissions as main. - tauri.conf.json: remove the invalid `"url": "/?window=widget"` field — WebviewUrl::App takes a path only, query strings aren't supported. Both windows now load index.html. - main-app.jsx: replace URL-query-based widget detection with getCurrentWindow().label === 'widget' via @tauri-apps/api/window. This is the Tauri 2-recommended pattern for multi-window apps and works regardless of URL routing. Closes the immediate UX bug behind the dictation widget being invisible. Builds cleanly + manually verified: pill widget visible on screen at top-center after `bun desktop-prod:pill`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

debpalash · 2026-05-18T11:40:05Z

Really like the shape of this PR @4shil — scoped to one feature (per-segment DSP presets), schema + API + frontend + tests all in proportion, and the fingerprint update in incremental.py to invalidate cache on preset changes is exactly right. CodeRabbit pass; tests look solid (12 cases with pytest.importorskip("pedalboard") graceful skip).

One blocker before merge — at backend/api/routers/dub_generate.py:163:

except (torch.cuda.OutOfMemoryError, torch.mps.MPSError) as e:

torch.mps.MPSError doesn't exist in PyTorch — hasattr(torch.mps, 'MPSError') is False. This will AttributeError at the except clause evaluation on every dub generation, not just MPS hosts. The previous code at this line was except Exception (broad), so narrowing here is correct intent, just the wrong type.

One-line fix — change to:

except (torch.cuda.OutOfMemoryError, RuntimeError) as e:

RuntimeError is the actual exception PyTorch raises for OOM on MPS (per the PyTorch MPS docs). Combined with the "out of memory" in str(e).lower() substring check you already have, this catches both CUDA + MPS OOM correctly without missing them.

If you'd rather I push the fix to your branch directly, just say the word — happy to. Otherwise, push the fix and I'll squash-merge as soon as CI is green.

Thanks for the work — looking forward to having presets in the dub pipeline.

4shil · 2026-05-18T12:23:22Z

Thanks for the heads up — fixed! Swapped for in the OOM catch clause (commit 9ea6879). PyTorch MPS raises for OOM, so this now correctly handles both CUDA and MPS hosts without the attribute error. CI should pick it up automatically.

…effect presets

4shil · 2026-05-18T15:59:36Z

Merged upstream/main (commit 027f9b2) and resolved the conflict in dub_generate.py. The upstream OOM retry logic with the full exception-based is_oom check is preserved. Per-segment DSP effect presets are applied in both the normal generation path and the OOM retry path. seg_instruct assignment also restored.

Add per-segment audio effects DSP preset selector to dub pipeline

472d1e5

Copilot AI review requested due to automatic review settings May 16, 2026 17:42

Copilot started reviewing on behalf of 4shil May 16, 2026 17:42 View session

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread tests/test_effects_chain.py

Copilot AI reviewed May 16, 2026

View reviewed changes

4shil added 2 commits May 16, 2026 23:26

Add shape assertions to podcast, warm, and bright preset tests

0c257fe

Fix raw preset semantics, add preset validation, update docs, remove …

fd7b908

…duplicate sys.path

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Narrow OOM catch to model.generate only in dub_generate

943c811

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread backend/api/routers/dub_generate.py Outdated

Preserve original OOM exception context in dub_generate

6b325b2

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Bind effect_preset to _gen via explicit parameter to avoid loop capture

924f842

Catch RuntimeError instead of torch.mps.MPSError for MPS OOM

9ea6879

Merge upstream/main — resolve conflict in dub_generate, preserve DSP …

027f9b2

…effect presets

		_GEN_INPUT_FIELDS = ("text", "target_lang", "profile_id", "instruct", "speed", "direction", "effect_preset")

		# Skip all tests if pedalboard is not installed
		pedalboard = pytest.importorskip("pedalboard")

Conversation

4shil commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type

Testing

Edge Cases

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

debpalash commented May 18, 2026

Uh oh!

4shil commented May 18, 2026

Uh oh!

4shil commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

4shil commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading