Skip to content

Add per-segment audio effects DSP preset selector to dub pipeline#68

Open
4shil wants to merge 8 commits into
debpalash:mainfrom
4shil:feat/audio-effects-dsp-presets
Open

Add per-segment audio effects DSP preset selector to dub pipeline#68
4shil wants to merge 8 commits into
debpalash:mainfrom
4shil:feat/audio-effects-dsp-presets

Conversation

@4shil
Copy link
Copy Markdown

@4shil 4shil commented May 16, 2026

Summary

Adds a per-segment audio effects DSP preset selector to the dub pipeline. Users can now choose from 6 broadcast-grade DSP presets (broadcast, cinematic, podcast, warm, bright, raw) that are applied to TTS-generated audio before mixing.

Changes

  • New GET /engines/effects/presets endpoint exposing existing EFFECT_PRESETS
  • DubSegment schema now accepts effect_preset field (default: "broadcast")
  • Dub generation hot loop applies apply_effects_chain() per segment
  • Batch TTS path supports per-segment effect presets
  • Standalone /generate endpoint accepts effect_preset form field
  • Frontend: effect presets loaded from API, stored in dubSlice (state only, no UI in this PR)
  • Segment fingerprint (effect_preset in _GEN_INPUT_FIELDS) updated so preset changes trigger regen
  • 12 new unit tests for effects chain (pure functions, no GPU needed)

Type

  • New feature

Testing

  • Unit tests added (tests/test_effects_chain.py, 12 test cases)
  • Full test suite passes (tests/ + backend/tests/)
  • Frontend typecheck passes

Edge Cases

  • pedalboard not installed: apply_effects_chain() returns audio unmodified (existing graceful degradation)
  • Invalid preset ID: Pydantic validator on DubSegment returns 422
  • "raw" preset: Empty chain, processing skipped for efficiency
  • Backward compatibility: Defaults to broadcast, existing clients unaffected

Summary by CodeRabbit

  • New Features

    • Per-segment audio effect presets for dubbing (broadcast, cinematic, podcast, warm, bright, raw); "raw" returns un‑mastered output.
    • Endpoint and frontend support to list/select presets; presets are submitted with generation requests.
  • Behavior

    • Preset selection affects generation and caching—changing a segment's preset will trigger regenerated audio.
  • Tests

    • Added tests validating preset listings, effect chains, DSP behavior, and clipping prevention.

Review Change Stack

Copilot AI review requested due to automatic review settings May 16, 2026 17:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-segment DSP effect-preset support: schemas and validation, discovery endpoint, frontend types/store, and runtime plumbing to apply presets (including raw bypass) during generation and batched processing; fingerprints now include effect_preset; DSP tests added.

Changes

Per-Segment Audio Effect Presets

Layer / File(s) Summary
Effect preset data models and validation
backend/api/schemas.py, backend/schemas/requests.py, frontend/src/api/types.ts
EffectPresetEntry and EffectPresetsResponse schemas added; DubSegment gains effect_preset plus a validator enforcing values from EFFECT_PRESETS.
Effect preset discovery API
backend/api/routers/engines.py, frontend/src/api/engines.ts
New GET /engines/effects/presets endpoint returns available presets; frontend fetchEffectPresets() fetches and types the response.
Effect application in generation pipeline
backend/api/routers/generation.py
/generate and _run_inference() now accept effect_preset (default "broadcast") and conditionally apply an effect chain after mastering unless preset is "raw".
Effect application in dub generation
backend/api/routers/dub_generate.py
Per-segment dub generation computes sample rate defensively, applies per-segment effect chains post-mastering (skip when "raw"), narrows OOM exception handling, and threads effect_preset into segment fingerprints.
Effect application in batched processing
backend/services/batched_tts.py
SegmentSpec includes effect_preset; batched TTS applies per-segment effect chains between mastering and normalization when preset != "raw".
Incremental cache invalidation
backend/services/incremental.py
Deterministic segment fingerprint inputs include effect_preset, causing regeneration when presets change.
Frontend state management
frontend/src/store/dubSlice.ts
Zustand store extended with segmentEffectPresets and availableEffectPresets plus setters setSegmentEffectPreset() and setAvailableEffectPresets().
DSP effect chain validation
tests/test_effects_chain.py
Pytest suite validates preset metadata, chain resolution, DSP execution, output shape preservation, clipping prevention for limiter-based presets, and differences from raw.

Sequence Diagram

sequenceDiagram
  participant Client as DubClient
  participant Store as ZustandStore
  participant API as Generation/Dub API
  participant DSP as audio_dsp
  Client->>Store: setSegmentEffectPreset(segId, presetId)
  Client->>API: POST /generate or /dub_generate (effect_preset)
  API->>DSP: list_effect_presets()
  DSP-->>API: presets[]
  API->>DSP: get_effect_chain(effect_preset)
  DSP-->>API: effect_chain (or [])
  API->>DSP: apply_mastering(raw_audio)
  DSP-->>API: mastered_audio
  alt effect_preset != "raw"
    API->>DSP: apply_effects_chain(mastered_audio, chain)
    DSP-->>API: processed_audio
  else raw
    API-->>API: skip effects
  end
  API->>DSP: normalize_audio(processed_or_mastered)
  DSP-->>API: final_audio
  API-->>Client: generated audio
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐇 I mixed a little sonic stew tonight,

Broadcast, warm, bright, and cinematic light;
Per-segment presets hop and play,
Fingerprints remember the rabbit way,
Now every dub finds its perfect bite.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding per-segment audio effects DSP preset selection to the dub pipeline.
Description check ✅ Passed The PR description includes all required template sections: clear Summary, detailed Changes, Type checkbox marked, comprehensive Testing details, completed Checklist items, and covers edge cases thoroughly.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_effects_chain.py`:
- Around line 97-114: Add shape assertions to the three tests so they verify the
output preserves input shape like the other tests: in
test_podcast_preset_returns_tensor, test_warm_preset_returns_tensor, and
test_bright_preset_returns_tensor, after calling _make_test_audio(),
get_effect_chain(...), and result = apply_effects_chain(...), add an assertion
that result.shape == audio.shape; reference the test function names and the
helpers _make_test_audio, get_effect_chain, and apply_effects_chain to locate
and update the tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0d41aa1b-e270-4c18-acb1-625829ad4166

📥 Commits

Reviewing files that changed from the base of the PR and between ba4cf6a and 472d1e5.

📒 Files selected for processing (11)
  • backend/api/routers/dub_generate.py
  • backend/api/routers/engines.py
  • backend/api/routers/generation.py
  • backend/api/schemas.py
  • backend/schemas/requests.py
  • backend/services/batched_tts.py
  • backend/services/incremental.py
  • frontend/src/api/engines.ts
  • frontend/src/api/types.ts
  • frontend/src/store/dubSlice.ts
  • tests/test_effects_chain.py

Comment thread tests/test_effects_chain.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end plumbing for selecting an audio DSP “effect preset” per dub segment (and for standalone generation), exposing available presets via a new backend API route and storing them in frontend state.

Changes:

  • Backend: expose effect presets via GET /engines/effects/presets, validate DubSegment.effect_preset, and include effect_preset in incremental regen fingerprints.
  • Backend: apply apply_effects_chain() in both dub generation and batched TTS paths; accept effect_preset on standalone /generate.
  • Frontend: add types + API client for effect presets and store preset state in dubSlice; add unit tests for effects chain helpers.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
backend/api/routers/engines.py Adds GET /engines/effects/presets endpoint.
backend/api/schemas.py Introduces response schemas for effect preset listing.
backend/schemas/requests.py Adds effect_preset field + validator to DubSegment.
backend/api/routers/dub_generate.py Applies per-segment effect chain during dub TTS generation and includes preset in fingerprint input.
backend/services/batched_tts.py Adds effect_preset to batched segment spec and applies effect chain per segment.
backend/api/routers/generation.py Accepts effect_preset form field and applies effect chain in standalone generation.
backend/services/incremental.py Adds effect_preset to segment fingerprint inputs.
frontend/src/api/engines.ts Adds EffectPreset types and fetchEffectPresets() API call.
frontend/src/api/types.ts Adds DubSegment interface including optional effect_preset.
frontend/src/store/dubSlice.ts Stores per-segment preset selections + available presets in Zustand slice.
tests/test_effects_chain.py Adds unit tests for preset listing, chain retrieval, and effects application.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/api/routers/dub_generate.py Outdated
Comment on lines +164 to +176
mastered_audio = apply_mastering(audio_out, sample_rate=sr)

# Apply per-segment DSP effect preset (default: broadcast)
seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
if seg_effect_preset != "raw":
effect_chain = get_effect_chain(seg_effect_preset)
if effect_chain:
mastered_audio = apply_effects_chain(
mastered_audio,
sample_rate=sr,
chain=effect_chain,
)

Comment thread backend/services/batched_tts.py Outdated
Comment on lines +147 to +157
mastered = apply_mastering(audio_out, sample_rate=sr)

# Apply per-segment DSP effect preset (default: broadcast)
seg_effect_preset = getattr(s, "effect_preset", None) or "broadcast"
if seg_effect_preset != "raw":
effect_chain = get_effect_chain(seg_effect_preset)
if effect_chain:
mastered = apply_effects_chain(
mastered, sample_rate=sr, chain=effect_chain
)

Comment on lines +99 to 101
effect_preset: str = Form("broadcast"),
):
_model = await get_model()
Comment on lines +23 to 24
_GEN_INPUT_FIELDS = ("text", "target_lang", "profile_id", "instruct", "speed", "direction", "effect_preset")

Comment thread tests/test_effects_chain.py Outdated
Comment on lines +11 to +16
import sys
import os

# Ensure the backend package is importable from tests/
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "backend"))

Comment thread tests/test_effects_chain.py Outdated
Comment on lines +17 to +19
# Skip all tests if pedalboard is not installed
pedalboard = pytest.importorskip("pedalboard")

Comment thread backend/api/routers/generation.py Outdated
Comment on lines +51 to +60
mastered_audio = apply_mastering(audio_out, sample_rate=sr)

# Apply DSP effect preset
_effect_preset = effect_preset or "broadcast"
if _effect_preset != "raw":
_chain = get_effect_chain(_effect_preset)
if _chain:
mastered_audio = apply_effects_chain(
mastered_audio, sample_rate=sr, chain=_chain,
)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/api/routers/dub_generate.py (1)

154-193: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t remap all _gen failures to “GPU out of memory.”

The new DSP calls (get_effect_chain / apply_effects_chain) are now inside the broad except Exception, so non-OOM failures are surfaced as OOM and become hard to diagnose.

Suggested fix
-                try:
-                    audios = _model.generate(
+                try:
+                    audios = _model.generate(
                         text=text, language=lang if lang != "Auto" else None,
                         ref_audio=ref_audio, ref_text=ref_text,
                         instruct=instruct_str if instruct_str else None,
                         duration=dur_s, num_step=nstep, guidance_scale=cfg,
                         speed=spd, denoise=True, postprocess_output=True,
                     )
-                    audio_out = audios[0]
+                except Exception as e:
+                    import gc
+                    gc.collect()
+                    if torch.backends.mps.is_available():
+                        torch.mps.empty_cache()
+                    elif torch.cuda.is_available():
+                        torch.cuda.empty_cache()
+                    raise RuntimeError(
+                        "Ran out of GPU memory generating this segment. "
+                        "Try the Flush button in the header to free VRAM, or switch to CPU in Settings. "
+                        f"Underlying error: {e}"
+                    )
+
+                try:
+                    audio_out = audios[0]
                     sr = _model.sampling_rate if hasattr(_model, 'sampling_rate') else 24000
                     # ... DSP path ...
                     return normalize_audio(mastered_audio, target_dBFS=-2.0)
                 except Exception:
-                    import gc
-                    gc.collect()
-                    if torch.backends.mps.is_available():
-                        torch.mps.empty_cache()
-                    elif torch.cuda.is_available():
-                        torch.cuda.empty_cache()
-                    raise RuntimeError(
-                        f"Ran out of GPU memory generating this segment. "
-                        f"Try the Flush button in the header to free VRAM, or switch to CPU in Settings. "
-                        f"Underlying error: {e}"
-                    )
+                    raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` around lines 154 - 193, The catch-all
except around the entire block remaps any failure in DSP (get_effect_chain,
apply_effects_chain, apply_mastering, normalize_audio) to a GPU OOM error; fix
by narrowing the try/except to only wrap the model generation step
(_model.generate) and only translate genuine OOM errors (e.g.,
torch.cuda.OutOfMemoryError or RuntimeError with "out of memory") into the
user-facing RuntimeError after clearing caches; perform DSP (apply_mastering,
get_effect_chain, apply_effects_chain, normalize_audio) outside that try so
their exceptions propagate normally, or if you must keep a catch, re-raise
non-OOM exceptions unchanged.
🧹 Nitpick comments (1)
backend/api/routers/dub_generate.py (1)

110-110: ⚡ Quick win

Avoid closing over loop variable seg in executor worker.

_gen captures seg from the outer loop at line 166 (seg.effect_preset), which triggers Ruff B023 and creates fragility if execution patterns change. Pass effect_preset explicitly as a parameter instead.

Suggested fix
-            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None):
+            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"):
@@
-                    seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
+                    seg_effect_preset = effect_preset or "broadcast"
@@
                 audio_tensor = await loop.run_in_executor(
                     _gpu_pool, _gen,
                     seg.text, seg_lang, seg_instruct, seg_duration,
-                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    _num_step, req.guidance_scale, seg_speed, seg_profile, getattr(seg, "effect_preset", None),
                 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` at line 110, The helper _gen function is
closing over the loop variable seg (used via seg.effect_preset), which causes a
late-binding bug (Ruff B023); modify _gen to accept effect_preset (or preset) as
an explicit parameter and pass seg.effect_preset from the loop when submitting
to the executor, update all calls to _gen to supply this new argument, and
remove any direct references to seg inside _gen so it no longer captures the
outer loop variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 154-193: The catch-all except around the entire block remaps any
failure in DSP (get_effect_chain, apply_effects_chain, apply_mastering,
normalize_audio) to a GPU OOM error; fix by narrowing the try/except to only
wrap the model generation step (_model.generate) and only translate genuine OOM
errors (e.g., torch.cuda.OutOfMemoryError or RuntimeError with "out of memory")
into the user-facing RuntimeError after clearing caches; perform DSP
(apply_mastering, get_effect_chain, apply_effects_chain, normalize_audio)
outside that try so their exceptions propagate normally, or if you must keep a
catch, re-raise non-OOM exceptions unchanged.

---

Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The helper _gen function is closing over the loop variable seg (used
via seg.effect_preset), which causes a late-binding bug (Ruff B023); modify _gen
to accept effect_preset (or preset) as an explicit parameter and pass
seg.effect_preset from the loop when submitting to the executor, update all
calls to _gen to supply this new argument, and remove any direct references to
seg inside _gen so it no longer captures the outer loop variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e3be0b46-db8c-46ce-a2c6-8883f8cc3163

📥 Commits

Reviewing files that changed from the base of the PR and between 0c257fe and fd7b908.

📒 Files selected for processing (5)
  • backend/api/routers/dub_generate.py
  • backend/api/routers/generation.py
  • backend/services/batched_tts.py
  • backend/services/incremental.py
  • tests/test_effects_chain.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • backend/services/batched_tts.py
  • backend/services/incremental.py
  • backend/api/routers/generation.py
  • tests/test_effects_chain.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 162-173: The except block that catches torch.cuda.OutOfMemoryError
and torch.mps.MPSError should preserve the original exception context when
rethrowing; change the RuntimeError raise in that except handler (the block
catching torch.cuda.OutOfMemoryError, torch.mps.MPSError in dub_generate.py) to
use exception chaining (raise RuntimeError(... ) from e) so the original
traceback and error details are retained for diagnostics while keeping the same
GC and backend cache clearing logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1869ab5d-c3a9-4a50-83db-0a39567d8d84

📥 Commits

Reviewing files that changed from the base of the PR and between fd7b908 and 943c811.

📒 Files selected for processing (1)
  • backend/api/routers/dub_generate.py

Comment thread backend/api/routers/dub_generate.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/api/routers/dub_generate.py (1)

110-110: ⚡ Quick win

Bind effect_preset explicitly instead of capturing loop variable seg.

Line 179 closes over seg inside _gen (defined in the loop), which is exactly the B023 footgun and can apply the wrong preset if execution semantics change. Pass the preset as an explicit _gen argument.

Proposed fix
-            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None):
+            def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"):
@@
-                seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast"
-                if seg_effect_preset == "raw":
+                if effect_preset == "raw":
                     # Raw: skip all DSP — return raw model output
                     return audio_out

                 mastered_audio = apply_mastering(audio_out, sample_rate=sr)
-                effect_chain = get_effect_chain(seg_effect_preset)
+                effect_chain = get_effect_chain(effect_preset)
@@
                 audio_tensor = await loop.run_in_executor(
                     _gpu_pool, _gen,
                     seg.text, seg_lang, seg_instruct, seg_duration,
-                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    _num_step, req.guidance_scale, seg_speed, seg_profile,
+                    (getattr(seg, "effect_preset", None) or "broadcast"),
                 )

Also applies to: 179-190, 229-233

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` at line 110, The closure _gen (defined
as def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None))
captures the loop variable seg and its effect_preset, which risks the B023
loop-variable capture bug; change _gen signature to accept effect_preset (e.g.,
add a parameter effect_preset) and pass the current seg.effect_preset when
calling _gen from inside the loop so the preset is bound explicitly rather than
captured from seg; update all call sites in the loop (and the similar blocks
around lines referenced) to supply the new effect_preset argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The closure _gen (defined as def _gen(text, lang, instruct_str,
dur_s, nstep, cfg, spd, profile_id=None)) captures the loop variable seg and its
effect_preset, which risks the B023 loop-variable capture bug; change _gen
signature to accept effect_preset (e.g., add a parameter effect_preset) and pass
the current seg.effect_preset when calling _gen from inside the loop so the
preset is bound explicitly rather than captured from seg; update all call sites
in the loop (and the similar blocks around lines referenced) to supply the new
effect_preset argument.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 413004e7-8dc1-4ce9-b974-40653772f2b0

📥 Commits

Reviewing files that changed from the base of the PR and between 943c811 and 6b325b2.

📒 Files selected for processing (1)
  • backend/api/routers/dub_generate.py

debpalash added a commit that referenced this pull request May 16, 2026
… OOS deferrals

- GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set
- INST-01: note PR #62 implements setuptools pin (closes #58)
- INST-04: note PR #62 lands README docs for #56 workaround
- INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning)
- Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir),
  PR #66 zh-CN (i18n milestone), #63 (empty-template bug)

PR #62 is the user's own Wave 1 work landed as a separate PR while
GSD planning ran in parallel. Merging it eliminates duplicate work
in Phase 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
debpalash added a commit that referenced this pull request May 17, 2026
…ase smoke (#71)

* docs: initialize OmniVoice stabilization milestone project

* chore: add project config (yolo + balanced)

* docs: domain research for stabilization milestone

* docs: define v1 requirements for stabilization milestone

* docs: add GGUF + singing engine spike requirements (Phase 4 new)

* docs: roadmap revision + CLAUDE.md (7 phases, 62 reqs, +GGUF/SING spikes)

* docs(phase-0): add Gates phase RESEARCH.md

Phase 0 research synthesizes the cross-platform CI matrix, frozen
omnivoice_data fixture, installer post-build smoke, SHA-256 checksum
publishing, and PR-template extension into copy-paste-ready YAML and
Python snippets composed entirely from existing in-repo patterns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(phase-0): add Gates phase CONTEXT, PATTERNS, and PLAN

Phase 0 — Gates is the hard pre-condition for v0.3.x stabilization.
Lays cross-platform CI matrix (macos-14/windows-2022/ubuntu-22.04),
regression fixture (≤200 KB), installer smoke on tag push, SHA-256
checksums in release body + per-OS SHA256SUMS-*.txt assets, PR
template with RC cadence + fixture line, and the open-PR landing
for #51.

Plan covers GATE-01..06; structured into 7 slices (A–G) with explicit
Slice C → Slice G dependency reordering so the new smoke-matrix lands
on main before PR #51 (CONTEXT.md L86 interleave decision).

Plan-checker iteration 2: APPROVED — all 3 BLOCKERs + 3 MAJORs from
iteration 1 resolved (file truncation/Slice-G missing, GATE-06 sibling
PR verification, Slice C ordering, Truth #5 wording, macOS Tauri
WebView avoidance per Pitfall #5, Windows taskkill per Pitfall #2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(00-gates): seed regression fixture (GATE-01)

- scripts/seed-test-fixture.py — deterministic builder for tests/fixtures/omnivoice_data/
  - wipes + rebuilds; fixed created_at=1700000000.0; all-zero PCM for byte-deterministic diffs
  - calls backend.core.db.init_db() directly (alembic versions/ is empty — see CONTEXT.md)
  - checkpoints WAL → DELETE on close so no -shm/-wal sidecars pollute git status
  - exits non-zero if fixture > 200 KB
- tests/fixtures/omnivoice_data/{omnivoice.db, README.md} — 8-table empty DB + 1 voice_profiles row
- tests/fixtures/omnivoice_data/voices/test-voice/{profile.json, sample.wav} — 1-sec 24 kHz mono silence
- .gitignore — explicit allow-list (!tests/fixtures/omnivoice_data/**) so the existing
  omnivoice_data/, *.db, *.wav patterns don't hide the fixture from git

Verifies: du = 144 KB on disk; sqlite_master lists 8 init_db tables + sqlite_sequence;
voice_profiles has exactly 1 row id='test-voice'; 0 rows in generation_history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(00-gates): add tests/smoke/test_boot_smoke.py (GATE-01)

- tests/smoke/__init__.py — package marker so pytest treats tests/smoke/ as a module
- tests/smoke/test_boot_smoke.py — 4 in-process FastAPI TestClient smoke tests:
    * test_health_returns_ok — /health returns 200 + {status:ok, device:...}
    * test_profiles_endpoint_lists_fixture_voice — /profiles surfaces the seeded
      test-voice row (validates OMNIVOICE_DATA_DIR wiring → DB_PATH → init_db schema)
    * test_system_info_includes_data_dir — /system/info resolves data_dir
    * test_history_endpoint_empty — /history reaches DB and returns []
  Test isolation env vars (OMNIVOICE_MODEL=test, OMNIVOICE_DISABLE_FILE_LOG=1)
  set at module top BEFORE any backend import — pattern from tests/test_router_smoke.py.
  Fixture is copied to a per-session temp dir so the test never mutates the
  checked-in artifact (SQLite file-change counter + runtime subdirs like dub_jobs/
  would otherwise dirty `git status` after every run).
  Failure mode: if tests/fixtures/omnivoice_data/ is missing, pytest.fail at
  import time with the regenerate command.
- .gitignore — tighten the GATE-01 allow-list to ONLY the seed-produced files
  (README.md, omnivoice.db, voices/test-voice/profile.json, sample.wav).
  Prevents future runtime subdirs the backend may create under the fixture
  from being accidentally committed.

Verifies: `uv run pytest tests/smoke/ -q --tb=short` → 4 passed in 1.31 s
(target was < 30 s). `git status` clean after a test run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(triage): record post-planning GitHub state — PR #62, new issues, OOS deferrals

- GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set
- INST-01: note PR #62 implements setuptools pin (closes #58)
- INST-04: note PR #62 lands README docs for #56 workaround
- INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning)
- Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir),
  PR #66 zh-CN (i18n milestone), #63 (empty-template bug)

PR #62 is the user's own Wave 1 work landed as a separate PR while
GSD planning ran in parallel. Merging it eliminates duplicate work
in Phase 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(00-gates): add cross-platform smoke matrix (GATE-02)

- New smoke-matrix job on macos-14, windows-2022, ubuntu-22.04
- needs: test, fail-fast: false, timeout-minutes: 10
- Pinned actions: checkout@v4, setup-python@v5, setup-uv@v3 (cache enabled)
- Per-OS ffmpeg + libsndfile install (brew/choco/apt via awalsh128 cache)
- UV_HTTP_TIMEOUT=120, UV_HTTP_RETRIES=5 for restricted-network resilience
- Narrow scope: uv run pytest tests/smoke/ -q --tb=short
- Existing `test` and `tauri-cross-platform` jobs untouched

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: add workflow_dispatch to ci.yml so smoke-matrix can run on feature branches

* feat(00-gates): add --health-check CLI flag to backend entrypoint (GATE-03)

- argparse on __main__ block; --health-check boots uvicorn in a daemon
  thread and polls http://127.0.0.1:3900/health every 5s for up to 60s.
- Prints 'OK — /health responded 200 after Ns' and exits 0 on first 200.
- Prints 'FAIL — /health did not respond 200 within 60s' to stderr and
  exits 1 on timeout. Default invocation behavior unchanged.
- No new deps (stdlib argparse/threading/time/urllib.request/sys + uvicorn).
- Consumed by per-OS installer-smoke step in .github/workflows/release.yml.

Verified locally: exits 0 in 5s against tests/fixtures/omnivoice_data/.

* ci(00-gates): add per-OS installer smoke to release.yml (GATE-03)

Adds three matrix-leg-specific steps after 'Build + release (Tauri)',
each gated by runner.os with timeout-minutes: 5:

- macOS (macos-14): hdiutil attach DMG → locate bundled Python backend
  inside *.app/Contents (NOT the Tauri WebView shell — RESEARCH Pitfall
  #5: WebView hangs on headless runners) → invoke --health-check →
  hdiutil detach. Falls back to *.app/Contents/Resources and hard-fails
  with a directory listing if no backend binary found.

- Windows (windows-2022): msiexec /quiet install → find backend.exe
  under 'C:/Program Files/OmniVoice Studio' → invoke --health-check in
  background, wait, then taskkill //F //T //PID to cleanup orphaned
  PyInstaller child processes on port 3900 (RESEARCH Pitfall #2).

- Linux (ubuntu-22.04): --appimage-extract (no FUSE on GH runners),
  locate binary or AppRun, run under xvfb-run -a.

Bundle-only regressions (PyInstaller missing-module, Tauri sidecar
path mismatch) are invisible to ci.yml's in-process smoke matrix —
this step closes that gap before any release is published.

Verified: YAML parses; all three steps present; gating + timeout
correct; Pitfall #2/#5 mitigations preserved.

* ci(00-gates): publish SHA-256 checksums in release body + as asset (GATE-05)

- Add 'Compute SHA-256 checksums' step writing SHA256SUMS-<label>.txt
  per matrix leg using native shasum/sha256sum (Git Bash on Windows).
- Add 'Append checksums to release + attach SHA256SUMS file' step using
  softprops/action-gh-release@v2 with append_body: true so the hashes
  land in the release body alongside tauri-action's content (not
  replacing it) and the file is uploaded as a release asset for
  'shasum -c SHA256SUMS-<label>.txt' verification.
- Both steps gated by 'github.event_name == push && refs/tags/v*' so
  workflow_dispatch dry-runs do not attempt to attach to a non-existent
  release (per CONTEXT.md L70 + RESEARCH Pitfall #7 deferral of any
  aggregate cross-leg SHA256SUMS job).
- fail_on_unmatched_files: true to surface path-resolution errors loudly.

* docs(00-gates): document RC cadence + regression-fixture check in PR template (GATE-04)

* docs(setup): add HF token persistence guide for macOS/Windows/Linux (DOCS-05)

Covers two persistent paths:
- Method A — canonical ~/.cache/huggingface/token via huggingface-cli login
- Method B — shell env var (~/.zshrc / ~/.bashrc / Windows User scope)

Documents the v0.2.7 "session only" in-app behavior + notes that
Phase 1 AUTH-03 will make in-app pastes write to the canonical file.

Bundled with Phase 0 PR per user request. Strictly DOCS-05 scope —
zero code changes, no engine touches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* spec(auth): redesign HF token resolution as 3-source cascade with fallback (AUTH-01..06)

Replaces the env_store.py file-based design with a SQLite-backed app
store + cascade resolver that checks app → env var → ~/.cache/huggingface/token
in priority order, with automatic fallback to next source on HTTP 401.

User-explicit design decision:
- App-stored token (SQLite settings table, AES-GCM encrypted) wins
- Env var ($HF_TOKEN) second
- Global huggingface-cli login file third
- All three sources visible in Settings → API Keys with "Active" badge
- Save action populates BOTH app store AND canonical HF file (defense in depth)

New requirement:
- AUTH-06 — on 401, auto-retry next source in cascade before erroring

Also: traceability count corrected (62 → 74 — undercount at planning +
INST-12 + AUTH-06 added post-planning). All 74 v1 reqs mapped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auth): backend recognizes HF token from canonical file, not just env var

Two call sites were only checking $HF_TOKEN env var, missing the canonical
~/.cache/huggingface/token file written by `huggingface-cli login` (or the
app's future Save action):

- system.py `/system/info` `has_hf_token` flag — UI showed "No HF token"
  even when `huggingface-cli login` had populated the file.
- model_manager.get_diarization_pipeline — pyannote diarization silently
  returned None when only the canonical file was set. This is the bug
  behind issue #35 (speaker diarization setup failure).

Both fixes use the same pattern: env var > huggingface_hub.get_token()
(which reads the canonical file). Adds a local _has_hf_token() helper
to system.py with a comment marking it as prelude to the AUTH-01..06
cascade (Phase 1 token_resolver.py will layer SQLite app-store on top).

Closes #35 sub-issue (canonical token invisible to diarization).
Cross-cuts AUTH-02 + AUTH-06 design for Phase 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dictation): make pill-widget mode reachable from GUI + scripts (INST-13)

The dictation widget infrastructure shipped in PR #40 but was only reachable
via the undocumented --pill CLI flag. Adds three discovery paths:

1. Tray menu: "Switch to Dictation Widget" (studio mode) — saves
   launch_as_widget=true to config, relaunches with --pill, exits current.
   Mirrors the existing "Open Studio" path in pill-mode tray.

2. Persistent config: AppConfig.launch_as_widget (bool, default false). Read
   at startup via load_config_pre_app() (uses dirs-next, no AppHandle
   required). CLI --pill still takes precedence when explicitly passed.

3. Tauri commands: get_launch_as_widget / set_launch_as_widget for the
   Phase 2 Settings UI to bind a checkbox to.

4. Scripts: bun desktop-prod:pill / desktop-prod:run:pill — forward --pill
   to the bundled app launch. macOS uses `open -n --args` to spawn fresh
   instance with the flag.

Closes the GUI half of INST-13. Phase 2 closes the Settings UI half.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dictation): show widget unconditionally on pill-mode launch + visible Suspense fallback

Before: pill mode set up correctly but the widget window stayed hidden
until ⌘⇧Space was pressed. New users saw absolutely nothing on launch
(no main window, no dock icon, hidden widget) and assumed the app
failed. If global-shortcut Accessibility permission wasn't granted,
they had no path to discover the widget at all.

Two changes:

1. lib.rs: in pill_mode_setup, explicitly show + position + focus the
   widget window after hiding main. With per-call error logging so we
   can diagnose failures (and a clear error log if widget window
   wasn't created at all — points at tauri.conf.json regression).

2. main-app.jsx: Suspense fallback was `null`, which combined with
   widget's transparent+decorations:false config made any lazy-import
   delay or failure invisible. Now renders a dark pill saying
   "Loading dictation…" so even if CaptureWidget lazy-import stalls,
   the user sees the window exists.

Studio mode behavior unchanged — widget stays hidden until hotkey
or tray click triggers it (existing show() call in the shortcut/
menu handlers is preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dictation): create widget window programmatically; Tauri 2 silently dropped config-array creation

Root cause: declaring the widget window in tauri.conf.json's app.windows[]
silently failed in Tauri 2 — get_webview_window("widget") returned None
even though the config was syntactically valid. Probable culprit was the
transparent + decorations:false + visible:false combo, but Tauri offered
no error message either at startup or via webview_windows() enumeration.

Diagnosed by adding webview_windows() enumeration logging at setup start
(only ["main"] ever appeared) and a programmatic WebviewWindowBuilder
fallback that surfaces real Result errors.

Fix:
- tauri.conf.json: widget entry now has `create: false` to make the
  config-vs-programmatic handoff explicit.
- lib.rs setup(): call WebviewWindowBuilder::new(app, "widget", ...).build()
  with the exact same surface attributes the config used to declare.
- capabilities/default.json: include "widget" in windows array so the new
  window inherits the same Tauri permissions as main.
- tauri.conf.json: remove the invalid `"url": "/?window=widget"` field —
  WebviewUrl::App takes a path only, query strings aren't supported.
  Both windows now load index.html.
- main-app.jsx: replace URL-query-based widget detection with
  getCurrentWindow().label === 'widget' via @tauri-apps/api/window. This
  is the Tauri 2-recommended pattern for multi-window apps and works
  regardless of URL routing.

Closes the immediate UX bug behind the dictation widget being invisible.
Builds cleanly + manually verified: pill widget visible on screen at
top-center after `bun desktop-prod:pill`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@debpalash
Copy link
Copy Markdown
Owner

Really like the shape of this PR @4shil — scoped to one feature (per-segment DSP presets), schema + API + frontend + tests all in proportion, and the fingerprint update in incremental.py to invalidate cache on preset changes is exactly right. CodeRabbit pass; tests look solid (12 cases with pytest.importorskip("pedalboard") graceful skip).

One blocker before merge — at backend/api/routers/dub_generate.py:163:

except (torch.cuda.OutOfMemoryError, torch.mps.MPSError) as e:

torch.mps.MPSError doesn't exist in PyTorch — hasattr(torch.mps, 'MPSError') is False. This will AttributeError at the except clause evaluation on every dub generation, not just MPS hosts. The previous code at this line was except Exception (broad), so narrowing here is correct intent, just the wrong type.

One-line fix — change to:

except (torch.cuda.OutOfMemoryError, RuntimeError) as e:

RuntimeError is the actual exception PyTorch raises for OOM on MPS (per the PyTorch MPS docs). Combined with the "out of memory" in str(e).lower() substring check you already have, this catches both CUDA + MPS OOM correctly without missing them.

If you'd rather I push the fix to your branch directly, just say the word — happy to. Otherwise, push the fix and I'll squash-merge as soon as CI is green.

Thanks for the work — looking forward to having presets in the dub pipeline.

@4shil
Copy link
Copy Markdown
Author

4shil commented May 18, 2026

Thanks for the heads up — fixed! Swapped for in the OOM catch clause (commit 9ea6879). PyTorch MPS raises for OOM, so this now correctly handles both CUDA and MPS hosts without the attribute error. CI should pick it up automatically.

@4shil
Copy link
Copy Markdown
Author

4shil commented May 18, 2026

Merged upstream/main (commit 027f9b2) and resolved the conflict in dub_generate.py. The upstream OOM retry logic with the full exception-based is_oom check is preserved. Per-segment DSP effect presets are applied in both the normal generation path and the OOM retry path. seg_instruct assignment also restored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants