Add per-segment audio effects DSP preset selector to dub pipeline#68
Add per-segment audio effects DSP preset selector to dub pipeline#684shil wants to merge 8 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-segment DSP effect-preset support: schemas and validation, discovery endpoint, frontend types/store, and runtime plumbing to apply presets (including raw bypass) during generation and batched processing; fingerprints now include effect_preset; DSP tests added. ChangesPer-Segment Audio Effect Presets
Sequence DiagramsequenceDiagram
participant Client as DubClient
participant Store as ZustandStore
participant API as Generation/Dub API
participant DSP as audio_dsp
Client->>Store: setSegmentEffectPreset(segId, presetId)
Client->>API: POST /generate or /dub_generate (effect_preset)
API->>DSP: list_effect_presets()
DSP-->>API: presets[]
API->>DSP: get_effect_chain(effect_preset)
DSP-->>API: effect_chain (or [])
API->>DSP: apply_mastering(raw_audio)
DSP-->>API: mastered_audio
alt effect_preset != "raw"
API->>DSP: apply_effects_chain(mastered_audio, chain)
DSP-->>API: processed_audio
else raw
API-->>API: skip effects
end
API->>DSP: normalize_audio(processed_or_mastered)
DSP-->>API: final_audio
API-->>Client: generated audio
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_effects_chain.py`:
- Around line 97-114: Add shape assertions to the three tests so they verify the
output preserves input shape like the other tests: in
test_podcast_preset_returns_tensor, test_warm_preset_returns_tensor, and
test_bright_preset_returns_tensor, after calling _make_test_audio(),
get_effect_chain(...), and result = apply_effects_chain(...), add an assertion
that result.shape == audio.shape; reference the test function names and the
helpers _make_test_audio, get_effect_chain, and apply_effects_chain to locate
and update the tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0d41aa1b-e270-4c18-acb1-625829ad4166
📒 Files selected for processing (11)
backend/api/routers/dub_generate.pybackend/api/routers/engines.pybackend/api/routers/generation.pybackend/api/schemas.pybackend/schemas/requests.pybackend/services/batched_tts.pybackend/services/incremental.pyfrontend/src/api/engines.tsfrontend/src/api/types.tsfrontend/src/store/dubSlice.tstests/test_effects_chain.py
There was a problem hiding this comment.
Pull request overview
Adds end-to-end plumbing for selecting an audio DSP “effect preset” per dub segment (and for standalone generation), exposing available presets via a new backend API route and storing them in frontend state.
Changes:
- Backend: expose effect presets via
GET /engines/effects/presets, validateDubSegment.effect_preset, and includeeffect_presetin incremental regen fingerprints. - Backend: apply
apply_effects_chain()in both dub generation and batched TTS paths; accepteffect_preseton standalone/generate. - Frontend: add types + API client for effect presets and store preset state in
dubSlice; add unit tests for effects chain helpers.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
backend/api/routers/engines.py |
Adds GET /engines/effects/presets endpoint. |
backend/api/schemas.py |
Introduces response schemas for effect preset listing. |
backend/schemas/requests.py |
Adds effect_preset field + validator to DubSegment. |
backend/api/routers/dub_generate.py |
Applies per-segment effect chain during dub TTS generation and includes preset in fingerprint input. |
backend/services/batched_tts.py |
Adds effect_preset to batched segment spec and applies effect chain per segment. |
backend/api/routers/generation.py |
Accepts effect_preset form field and applies effect chain in standalone generation. |
backend/services/incremental.py |
Adds effect_preset to segment fingerprint inputs. |
frontend/src/api/engines.ts |
Adds EffectPreset types and fetchEffectPresets() API call. |
frontend/src/api/types.ts |
Adds DubSegment interface including optional effect_preset. |
frontend/src/store/dubSlice.ts |
Stores per-segment preset selections + available presets in Zustand slice. |
tests/test_effects_chain.py |
Adds unit tests for preset listing, chain retrieval, and effects application. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mastered_audio = apply_mastering(audio_out, sample_rate=sr) | ||
|
|
||
| # Apply per-segment DSP effect preset (default: broadcast) | ||
| seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast" | ||
| if seg_effect_preset != "raw": | ||
| effect_chain = get_effect_chain(seg_effect_preset) | ||
| if effect_chain: | ||
| mastered_audio = apply_effects_chain( | ||
| mastered_audio, | ||
| sample_rate=sr, | ||
| chain=effect_chain, | ||
| ) | ||
|
|
| mastered = apply_mastering(audio_out, sample_rate=sr) | ||
|
|
||
| # Apply per-segment DSP effect preset (default: broadcast) | ||
| seg_effect_preset = getattr(s, "effect_preset", None) or "broadcast" | ||
| if seg_effect_preset != "raw": | ||
| effect_chain = get_effect_chain(seg_effect_preset) | ||
| if effect_chain: | ||
| mastered = apply_effects_chain( | ||
| mastered, sample_rate=sr, chain=effect_chain | ||
| ) | ||
|
|
| effect_preset: str = Form("broadcast"), | ||
| ): | ||
| _model = await get_model() |
| _GEN_INPUT_FIELDS = ("text", "target_lang", "profile_id", "instruct", "speed", "direction", "effect_preset") | ||
|
|
| import sys | ||
| import os | ||
|
|
||
| # Ensure the backend package is importable from tests/ | ||
| sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "backend")) | ||
|
|
| # Skip all tests if pedalboard is not installed | ||
| pedalboard = pytest.importorskip("pedalboard") | ||
|
|
| mastered_audio = apply_mastering(audio_out, sample_rate=sr) | ||
|
|
||
| # Apply DSP effect preset | ||
| _effect_preset = effect_preset or "broadcast" | ||
| if _effect_preset != "raw": | ||
| _chain = get_effect_chain(_effect_preset) | ||
| if _chain: | ||
| mastered_audio = apply_effects_chain( | ||
| mastered_audio, sample_rate=sr, chain=_chain, | ||
| ) |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/api/routers/dub_generate.py (1)
154-193:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t remap all
_genfailures to “GPU out of memory.”The new DSP calls (
get_effect_chain/apply_effects_chain) are now inside the broadexcept Exception, so non-OOM failures are surfaced as OOM and become hard to diagnose.Suggested fix
- try: - audios = _model.generate( + try: + audios = _model.generate( text=text, language=lang if lang != "Auto" else None, ref_audio=ref_audio, ref_text=ref_text, instruct=instruct_str if instruct_str else None, duration=dur_s, num_step=nstep, guidance_scale=cfg, speed=spd, denoise=True, postprocess_output=True, ) - audio_out = audios[0] + except Exception as e: + import gc + gc.collect() + if torch.backends.mps.is_available(): + torch.mps.empty_cache() + elif torch.cuda.is_available(): + torch.cuda.empty_cache() + raise RuntimeError( + "Ran out of GPU memory generating this segment. " + "Try the Flush button in the header to free VRAM, or switch to CPU in Settings. " + f"Underlying error: {e}" + ) + + try: + audio_out = audios[0] sr = _model.sampling_rate if hasattr(_model, 'sampling_rate') else 24000 # ... DSP path ... return normalize_audio(mastered_audio, target_dBFS=-2.0) except Exception: - import gc - gc.collect() - if torch.backends.mps.is_available(): - torch.mps.empty_cache() - elif torch.cuda.is_available(): - torch.cuda.empty_cache() - raise RuntimeError( - f"Ran out of GPU memory generating this segment. " - f"Try the Flush button in the header to free VRAM, or switch to CPU in Settings. " - f"Underlying error: {e}" - ) + raise🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/api/routers/dub_generate.py` around lines 154 - 193, The catch-all except around the entire block remaps any failure in DSP (get_effect_chain, apply_effects_chain, apply_mastering, normalize_audio) to a GPU OOM error; fix by narrowing the try/except to only wrap the model generation step (_model.generate) and only translate genuine OOM errors (e.g., torch.cuda.OutOfMemoryError or RuntimeError with "out of memory") into the user-facing RuntimeError after clearing caches; perform DSP (apply_mastering, get_effect_chain, apply_effects_chain, normalize_audio) outside that try so their exceptions propagate normally, or if you must keep a catch, re-raise non-OOM exceptions unchanged.
🧹 Nitpick comments (1)
backend/api/routers/dub_generate.py (1)
110-110: ⚡ Quick winAvoid closing over loop variable
segin executor worker.
_gencapturessegfrom the outer loop at line 166 (seg.effect_preset), which triggers Ruff B023 and creates fragility if execution patterns change. Passeffect_presetexplicitly as a parameter instead.Suggested fix
- def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None): + def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"): @@ - seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast" + seg_effect_preset = effect_preset or "broadcast" @@ audio_tensor = await loop.run_in_executor( _gpu_pool, _gen, seg.text, seg_lang, seg_instruct, seg_duration, - _num_step, req.guidance_scale, seg_speed, seg_profile, + _num_step, req.guidance_scale, seg_speed, seg_profile, getattr(seg, "effect_preset", None), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/api/routers/dub_generate.py` at line 110, The helper _gen function is closing over the loop variable seg (used via seg.effect_preset), which causes a late-binding bug (Ruff B023); modify _gen to accept effect_preset (or preset) as an explicit parameter and pass seg.effect_preset from the loop when submitting to the executor, update all calls to _gen to supply this new argument, and remove any direct references to seg inside _gen so it no longer captures the outer loop variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 154-193: The catch-all except around the entire block remaps any
failure in DSP (get_effect_chain, apply_effects_chain, apply_mastering,
normalize_audio) to a GPU OOM error; fix by narrowing the try/except to only
wrap the model generation step (_model.generate) and only translate genuine OOM
errors (e.g., torch.cuda.OutOfMemoryError or RuntimeError with "out of memory")
into the user-facing RuntimeError after clearing caches; perform DSP
(apply_mastering, get_effect_chain, apply_effects_chain, normalize_audio)
outside that try so their exceptions propagate normally, or if you must keep a
catch, re-raise non-OOM exceptions unchanged.
---
Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The helper _gen function is closing over the loop variable seg (used
via seg.effect_preset), which causes a late-binding bug (Ruff B023); modify _gen
to accept effect_preset (or preset) as an explicit parameter and pass
seg.effect_preset from the loop when submitting to the executor, update all
calls to _gen to supply this new argument, and remove any direct references to
seg inside _gen so it no longer captures the outer loop variable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e3be0b46-db8c-46ce-a2c6-8883f8cc3163
📒 Files selected for processing (5)
backend/api/routers/dub_generate.pybackend/api/routers/generation.pybackend/services/batched_tts.pybackend/services/incremental.pytests/test_effects_chain.py
🚧 Files skipped from review as they are similar to previous changes (4)
- backend/services/batched_tts.py
- backend/services/incremental.py
- backend/api/routers/generation.py
- tests/test_effects_chain.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 162-173: The except block that catches torch.cuda.OutOfMemoryError
and torch.mps.MPSError should preserve the original exception context when
rethrowing; change the RuntimeError raise in that except handler (the block
catching torch.cuda.OutOfMemoryError, torch.mps.MPSError in dub_generate.py) to
use exception chaining (raise RuntimeError(... ) from e) so the original
traceback and error details are retained for diagnostics while keeping the same
GC and backend cache clearing logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1869ab5d-c3a9-4a50-83db-0a39567d8d84
📒 Files selected for processing (1)
backend/api/routers/dub_generate.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/api/routers/dub_generate.py (1)
110-110: ⚡ Quick winBind
effect_presetexplicitly instead of capturing loop variableseg.Line 179 closes over
seginside_gen(defined in the loop), which is exactly the B023 footgun and can apply the wrong preset if execution semantics change. Pass the preset as an explicit_genargument.Proposed fix
- def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None): + def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None, effect_preset="broadcast"): @@ - seg_effect_preset = getattr(seg, "effect_preset", None) or "broadcast" - if seg_effect_preset == "raw": + if effect_preset == "raw": # Raw: skip all DSP — return raw model output return audio_out mastered_audio = apply_mastering(audio_out, sample_rate=sr) - effect_chain = get_effect_chain(seg_effect_preset) + effect_chain = get_effect_chain(effect_preset) @@ audio_tensor = await loop.run_in_executor( _gpu_pool, _gen, seg.text, seg_lang, seg_instruct, seg_duration, - _num_step, req.guidance_scale, seg_speed, seg_profile, + _num_step, req.guidance_scale, seg_speed, seg_profile, + (getattr(seg, "effect_preset", None) or "broadcast"), )Also applies to: 179-190, 229-233
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/api/routers/dub_generate.py` at line 110, The closure _gen (defined as def _gen(text, lang, instruct_str, dur_s, nstep, cfg, spd, profile_id=None)) captures the loop variable seg and its effect_preset, which risks the B023 loop-variable capture bug; change _gen signature to accept effect_preset (e.g., add a parameter effect_preset) and pass the current seg.effect_preset when calling _gen from inside the loop so the preset is bound explicitly rather than captured from seg; update all call sites in the loop (and the similar blocks around lines referenced) to supply the new effect_preset argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/api/routers/dub_generate.py`:
- Line 110: The closure _gen (defined as def _gen(text, lang, instruct_str,
dur_s, nstep, cfg, spd, profile_id=None)) captures the loop variable seg and its
effect_preset, which risks the B023 loop-variable capture bug; change _gen
signature to accept effect_preset (e.g., add a parameter effect_preset) and pass
the current seg.effect_preset when calling _gen from inside the loop so the
preset is bound explicitly rather than captured from seg; update all call sites
in the loop (and the similar blocks around lines referenced) to supply the new
effect_preset argument.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 413004e7-8dc1-4ce9-b974-40653772f2b0
📒 Files selected for processing (1)
backend/api/routers/dub_generate.py
… OOS deferrals - GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set - INST-01: note PR #62 implements setuptools pin (closes #58) - INST-04: note PR #62 lands README docs for #56 workaround - INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning) - Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir), PR #66 zh-CN (i18n milestone), #63 (empty-template bug) PR #62 is the user's own Wave 1 work landed as a separate PR while GSD planning ran in parallel. Merging it eliminates duplicate work in Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase smoke (#71) * docs: initialize OmniVoice stabilization milestone project * chore: add project config (yolo + balanced) * docs: domain research for stabilization milestone * docs: define v1 requirements for stabilization milestone * docs: add GGUF + singing engine spike requirements (Phase 4 new) * docs: roadmap revision + CLAUDE.md (7 phases, 62 reqs, +GGUF/SING spikes) * docs(phase-0): add Gates phase RESEARCH.md Phase 0 research synthesizes the cross-platform CI matrix, frozen omnivoice_data fixture, installer post-build smoke, SHA-256 checksum publishing, and PR-template extension into copy-paste-ready YAML and Python snippets composed entirely from existing in-repo patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase-0): add Gates phase CONTEXT, PATTERNS, and PLAN Phase 0 — Gates is the hard pre-condition for v0.3.x stabilization. Lays cross-platform CI matrix (macos-14/windows-2022/ubuntu-22.04), regression fixture (≤200 KB), installer smoke on tag push, SHA-256 checksums in release body + per-OS SHA256SUMS-*.txt assets, PR template with RC cadence + fixture line, and the open-PR landing for #51. Plan covers GATE-01..06; structured into 7 slices (A–G) with explicit Slice C → Slice G dependency reordering so the new smoke-matrix lands on main before PR #51 (CONTEXT.md L86 interleave decision). Plan-checker iteration 2: APPROVED — all 3 BLOCKERs + 3 MAJORs from iteration 1 resolved (file truncation/Slice-G missing, GATE-06 sibling PR verification, Slice C ordering, Truth #5 wording, macOS Tauri WebView avoidance per Pitfall #5, Windows taskkill per Pitfall #2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(00-gates): seed regression fixture (GATE-01) - scripts/seed-test-fixture.py — deterministic builder for tests/fixtures/omnivoice_data/ - wipes + rebuilds; fixed created_at=1700000000.0; all-zero PCM for byte-deterministic diffs - calls backend.core.db.init_db() directly (alembic versions/ is empty — see CONTEXT.md) - checkpoints WAL → DELETE on close so no -shm/-wal sidecars pollute git status - exits non-zero if fixture > 200 KB - tests/fixtures/omnivoice_data/{omnivoice.db, README.md} — 8-table empty DB + 1 voice_profiles row - tests/fixtures/omnivoice_data/voices/test-voice/{profile.json, sample.wav} — 1-sec 24 kHz mono silence - .gitignore — explicit allow-list (!tests/fixtures/omnivoice_data/**) so the existing omnivoice_data/, *.db, *.wav patterns don't hide the fixture from git Verifies: du = 144 KB on disk; sqlite_master lists 8 init_db tables + sqlite_sequence; voice_profiles has exactly 1 row id='test-voice'; 0 rows in generation_history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(00-gates): add tests/smoke/test_boot_smoke.py (GATE-01) - tests/smoke/__init__.py — package marker so pytest treats tests/smoke/ as a module - tests/smoke/test_boot_smoke.py — 4 in-process FastAPI TestClient smoke tests: * test_health_returns_ok — /health returns 200 + {status:ok, device:...} * test_profiles_endpoint_lists_fixture_voice — /profiles surfaces the seeded test-voice row (validates OMNIVOICE_DATA_DIR wiring → DB_PATH → init_db schema) * test_system_info_includes_data_dir — /system/info resolves data_dir * test_history_endpoint_empty — /history reaches DB and returns [] Test isolation env vars (OMNIVOICE_MODEL=test, OMNIVOICE_DISABLE_FILE_LOG=1) set at module top BEFORE any backend import — pattern from tests/test_router_smoke.py. Fixture is copied to a per-session temp dir so the test never mutates the checked-in artifact (SQLite file-change counter + runtime subdirs like dub_jobs/ would otherwise dirty `git status` after every run). Failure mode: if tests/fixtures/omnivoice_data/ is missing, pytest.fail at import time with the regenerate command. - .gitignore — tighten the GATE-01 allow-list to ONLY the seed-produced files (README.md, omnivoice.db, voices/test-voice/profile.json, sample.wav). Prevents future runtime subdirs the backend may create under the fixture from being accidentally committed. Verifies: `uv run pytest tests/smoke/ -q --tb=short` → 4 passed in 1.31 s (target was < 30 s). `git status` clean after a test run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(triage): record post-planning GitHub state — PR #62, new issues, OOS deferrals - GATE-06: mark #53 + #61 merged (2026-05-16); add #62 (Wave 1 quick wins) to gate set - INST-01: note PR #62 implements setuptools pin (closes #58) - INST-04: note PR #62 lands README docs for #56 workaround - INST-12: new requirement for #65 Windows Triton/torch.compile OOM (filed post-planning) - Out of Scope: defer #67/PR #68 (audio effects), #64 (custom model dir), PR #66 zh-CN (i18n milestone), #63 (empty-template bug) PR #62 is the user's own Wave 1 work landed as a separate PR while GSD planning ran in parallel. Merging it eliminates duplicate work in Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(00-gates): add cross-platform smoke matrix (GATE-02) - New smoke-matrix job on macos-14, windows-2022, ubuntu-22.04 - needs: test, fail-fast: false, timeout-minutes: 10 - Pinned actions: checkout@v4, setup-python@v5, setup-uv@v3 (cache enabled) - Per-OS ffmpeg + libsndfile install (brew/choco/apt via awalsh128 cache) - UV_HTTP_TIMEOUT=120, UV_HTTP_RETRIES=5 for restricted-network resilience - Narrow scope: uv run pytest tests/smoke/ -q --tb=short - Existing `test` and `tauri-cross-platform` jobs untouched Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: add workflow_dispatch to ci.yml so smoke-matrix can run on feature branches * feat(00-gates): add --health-check CLI flag to backend entrypoint (GATE-03) - argparse on __main__ block; --health-check boots uvicorn in a daemon thread and polls http://127.0.0.1:3900/health every 5s for up to 60s. - Prints 'OK — /health responded 200 after Ns' and exits 0 on first 200. - Prints 'FAIL — /health did not respond 200 within 60s' to stderr and exits 1 on timeout. Default invocation behavior unchanged. - No new deps (stdlib argparse/threading/time/urllib.request/sys + uvicorn). - Consumed by per-OS installer-smoke step in .github/workflows/release.yml. Verified locally: exits 0 in 5s against tests/fixtures/omnivoice_data/. * ci(00-gates): add per-OS installer smoke to release.yml (GATE-03) Adds three matrix-leg-specific steps after 'Build + release (Tauri)', each gated by runner.os with timeout-minutes: 5: - macOS (macos-14): hdiutil attach DMG → locate bundled Python backend inside *.app/Contents (NOT the Tauri WebView shell — RESEARCH Pitfall #5: WebView hangs on headless runners) → invoke --health-check → hdiutil detach. Falls back to *.app/Contents/Resources and hard-fails with a directory listing if no backend binary found. - Windows (windows-2022): msiexec /quiet install → find backend.exe under 'C:/Program Files/OmniVoice Studio' → invoke --health-check in background, wait, then taskkill //F //T //PID to cleanup orphaned PyInstaller child processes on port 3900 (RESEARCH Pitfall #2). - Linux (ubuntu-22.04): --appimage-extract (no FUSE on GH runners), locate binary or AppRun, run under xvfb-run -a. Bundle-only regressions (PyInstaller missing-module, Tauri sidecar path mismatch) are invisible to ci.yml's in-process smoke matrix — this step closes that gap before any release is published. Verified: YAML parses; all three steps present; gating + timeout correct; Pitfall #2/#5 mitigations preserved. * ci(00-gates): publish SHA-256 checksums in release body + as asset (GATE-05) - Add 'Compute SHA-256 checksums' step writing SHA256SUMS-<label>.txt per matrix leg using native shasum/sha256sum (Git Bash on Windows). - Add 'Append checksums to release + attach SHA256SUMS file' step using softprops/action-gh-release@v2 with append_body: true so the hashes land in the release body alongside tauri-action's content (not replacing it) and the file is uploaded as a release asset for 'shasum -c SHA256SUMS-<label>.txt' verification. - Both steps gated by 'github.event_name == push && refs/tags/v*' so workflow_dispatch dry-runs do not attempt to attach to a non-existent release (per CONTEXT.md L70 + RESEARCH Pitfall #7 deferral of any aggregate cross-leg SHA256SUMS job). - fail_on_unmatched_files: true to surface path-resolution errors loudly. * docs(00-gates): document RC cadence + regression-fixture check in PR template (GATE-04) * docs(setup): add HF token persistence guide for macOS/Windows/Linux (DOCS-05) Covers two persistent paths: - Method A — canonical ~/.cache/huggingface/token via huggingface-cli login - Method B — shell env var (~/.zshrc / ~/.bashrc / Windows User scope) Documents the v0.2.7 "session only" in-app behavior + notes that Phase 1 AUTH-03 will make in-app pastes write to the canonical file. Bundled with Phase 0 PR per user request. Strictly DOCS-05 scope — zero code changes, no engine touches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * spec(auth): redesign HF token resolution as 3-source cascade with fallback (AUTH-01..06) Replaces the env_store.py file-based design with a SQLite-backed app store + cascade resolver that checks app → env var → ~/.cache/huggingface/token in priority order, with automatic fallback to next source on HTTP 401. User-explicit design decision: - App-stored token (SQLite settings table, AES-GCM encrypted) wins - Env var ($HF_TOKEN) second - Global huggingface-cli login file third - All three sources visible in Settings → API Keys with "Active" badge - Save action populates BOTH app store AND canonical HF file (defense in depth) New requirement: - AUTH-06 — on 401, auto-retry next source in cascade before erroring Also: traceability count corrected (62 → 74 — undercount at planning + INST-12 + AUTH-06 added post-planning). All 74 v1 reqs mapped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth): backend recognizes HF token from canonical file, not just env var Two call sites were only checking $HF_TOKEN env var, missing the canonical ~/.cache/huggingface/token file written by `huggingface-cli login` (or the app's future Save action): - system.py `/system/info` `has_hf_token` flag — UI showed "No HF token" even when `huggingface-cli login` had populated the file. - model_manager.get_diarization_pipeline — pyannote diarization silently returned None when only the canonical file was set. This is the bug behind issue #35 (speaker diarization setup failure). Both fixes use the same pattern: env var > huggingface_hub.get_token() (which reads the canonical file). Adds a local _has_hf_token() helper to system.py with a comment marking it as prelude to the AUTH-01..06 cascade (Phase 1 token_resolver.py will layer SQLite app-store on top). Closes #35 sub-issue (canonical token invisible to diarization). Cross-cuts AUTH-02 + AUTH-06 design for Phase 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dictation): make pill-widget mode reachable from GUI + scripts (INST-13) The dictation widget infrastructure shipped in PR #40 but was only reachable via the undocumented --pill CLI flag. Adds three discovery paths: 1. Tray menu: "Switch to Dictation Widget" (studio mode) — saves launch_as_widget=true to config, relaunches with --pill, exits current. Mirrors the existing "Open Studio" path in pill-mode tray. 2. Persistent config: AppConfig.launch_as_widget (bool, default false). Read at startup via load_config_pre_app() (uses dirs-next, no AppHandle required). CLI --pill still takes precedence when explicitly passed. 3. Tauri commands: get_launch_as_widget / set_launch_as_widget for the Phase 2 Settings UI to bind a checkbox to. 4. Scripts: bun desktop-prod:pill / desktop-prod:run:pill — forward --pill to the bundled app launch. macOS uses `open -n --args` to spawn fresh instance with the flag. Closes the GUI half of INST-13. Phase 2 closes the Settings UI half. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dictation): show widget unconditionally on pill-mode launch + visible Suspense fallback Before: pill mode set up correctly but the widget window stayed hidden until ⌘⇧Space was pressed. New users saw absolutely nothing on launch (no main window, no dock icon, hidden widget) and assumed the app failed. If global-shortcut Accessibility permission wasn't granted, they had no path to discover the widget at all. Two changes: 1. lib.rs: in pill_mode_setup, explicitly show + position + focus the widget window after hiding main. With per-call error logging so we can diagnose failures (and a clear error log if widget window wasn't created at all — points at tauri.conf.json regression). 2. main-app.jsx: Suspense fallback was `null`, which combined with widget's transparent+decorations:false config made any lazy-import delay or failure invisible. Now renders a dark pill saying "Loading dictation…" so even if CaptureWidget lazy-import stalls, the user sees the window exists. Studio mode behavior unchanged — widget stays hidden until hotkey or tray click triggers it (existing show() call in the shortcut/ menu handlers is preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dictation): create widget window programmatically; Tauri 2 silently dropped config-array creation Root cause: declaring the widget window in tauri.conf.json's app.windows[] silently failed in Tauri 2 — get_webview_window("widget") returned None even though the config was syntactically valid. Probable culprit was the transparent + decorations:false + visible:false combo, but Tauri offered no error message either at startup or via webview_windows() enumeration. Diagnosed by adding webview_windows() enumeration logging at setup start (only ["main"] ever appeared) and a programmatic WebviewWindowBuilder fallback that surfaces real Result errors. Fix: - tauri.conf.json: widget entry now has `create: false` to make the config-vs-programmatic handoff explicit. - lib.rs setup(): call WebviewWindowBuilder::new(app, "widget", ...).build() with the exact same surface attributes the config used to declare. - capabilities/default.json: include "widget" in windows array so the new window inherits the same Tauri permissions as main. - tauri.conf.json: remove the invalid `"url": "/?window=widget"` field — WebviewUrl::App takes a path only, query strings aren't supported. Both windows now load index.html. - main-app.jsx: replace URL-query-based widget detection with getCurrentWindow().label === 'widget' via @tauri-apps/api/window. This is the Tauri 2-recommended pattern for multi-window apps and works regardless of URL routing. Closes the immediate UX bug behind the dictation widget being invisible. Builds cleanly + manually verified: pill widget visible on screen at top-center after `bun desktop-prod:pill`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Really like the shape of this PR @4shil — scoped to one feature (per-segment DSP presets), schema + API + frontend + tests all in proportion, and the fingerprint update in One blocker before merge — at except (torch.cuda.OutOfMemoryError, torch.mps.MPSError) as e:
One-line fix — change to: except (torch.cuda.OutOfMemoryError, RuntimeError) as e:
If you'd rather I push the fix to your branch directly, just say the word — happy to. Otherwise, push the fix and I'll squash-merge as soon as CI is green. Thanks for the work — looking forward to having presets in the dub pipeline. |
|
Thanks for the heads up — fixed! Swapped for in the OOM catch clause (commit 9ea6879). PyTorch MPS raises for OOM, so this now correctly handles both CUDA and MPS hosts without the attribute error. CI should pick it up automatically. |
|
Merged upstream/main (commit 027f9b2) and resolved the conflict in dub_generate.py. The upstream OOM retry logic with the full exception-based is_oom check is preserved. Per-segment DSP effect presets are applied in both the normal generation path and the OOM retry path. seg_instruct assignment also restored. |
Summary
Adds a per-segment audio effects DSP preset selector to the dub pipeline. Users can now choose from 6 broadcast-grade DSP presets (broadcast, cinematic, podcast, warm, bright, raw) that are applied to TTS-generated audio before mixing.
Changes
GET /engines/effects/presetsendpoint exposing existingEFFECT_PRESETSDubSegmentschema now acceptseffect_presetfield (default: "broadcast")apply_effects_chain()per segment/generateendpoint acceptseffect_presetform fieldeffect_presetin_GEN_INPUT_FIELDS) updated so preset changes trigger regenType
Testing
tests/test_effects_chain.py, 12 test cases)tests/+backend/tests/)Edge Cases
apply_effects_chain()returns audio unmodified (existing graceful degradation)DubSegmentreturns 422broadcast, existing clients unaffectedSummary by CodeRabbit
New Features
Behavior
Tests