Skip to content

Per-segment audio effects DSP preset selector (closes #67, rebased from #68)#109

Merged
debpalash merged 9 commits into
mainfrom
pr-68-fresh
May 20, 2026
Merged

Per-segment audio effects DSP preset selector (closes #67, rebased from #68)#109
debpalash merged 9 commits into
mainfrom
pr-68-fresh

Conversation

@debpalash

@debpalash debpalash commented May 20, 2026

Copy link
Copy Markdown
Owner

Rebases PR #68 by @4shil onto current main and resolves merge conflicts in backend/api/routers/{dub_generate,generation}.py (Phase 2's audio_io helpers + PR #68's audio_dsp chain helpers were both added to the same import lines).

All original work is @4shil's. This PR exists because PR #68's branch lives on a fork I can't force-push to.

Summary (original)

Adds a per-segment audio effects DSP preset selector to the dub pipeline. Users can choose from 6 broadcast-grade DSP presets (broadcast, cinematic, podcast, warm, bright, raw) that apply to TTS-generated audio before mixing.

Changes (original)

  • New GET /engines/effects/presets endpoint returning the 6 preset definitions
  • Per-segment effect_preset parameter on the dub generation pipeline (_gen closure in dub_generate.py)
  • raw preset bypasses mastering entirely (returns the model output unchanged)
  • All other presets layer effect chains on top of the existing apply_mastering + normalize_audio step
  • tests/test_effects_chain.py — 15 unit tests covering preset surface, chain composition, and audio invariants

Conflict resolution

Both dub_generate.py and generation.py had identical-shaped conflicts: PR #68 added DSP helper imports while Phase 2 (PR #96) added audio_io safe-helper imports on the adjacent line. Resolved both as a union — keep both sets of imports.

Test plan

  • pytest tests/test_effects_chain.py15/15 pass post-rebase
  • pytest tests/smoke/ — green
  • Manual: open dub job, switch preset selector, verify audible difference between broadcast / cinematic / raw

🤖 Rebased with Claude Code — original PR #68 should be closed once this merges

Summary by CodeRabbit

Release Notes

  • New Features

    • Audio effect presets now available for generated audio with multiple preset options to choose from
    • New API endpoint to list available effect presets and their metadata
    • Per-segment effect preset selection and configuration capability
    • Effect presets consistently applied during both initial and retry generation attempts
  • Tests

    • Added comprehensive test suite for effect presets validation and audio processing pipeline

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR adds per-segment audio DSP effect presets to the generation pipelines. Segments can now select from available presets (defaulting to "broadcast"); the choice is applied during TTS synthesis via mastering, effect chain lookup, and normalization, with a "raw" bypass option. Effect preset changes trigger segment regeneration via updated fingerprints. Backend, frontend types, and state management are extended to support the feature, with comprehensive test coverage.

Changes

Effect Preset DSP Pipeline

Layer / File(s) Summary
Data contracts and effect preset validation
backend/api/schemas.py, backend/schemas/requests.py
EffectPresetEntry and EffectPresetsResponse schemas are defined for the presets API; DubSegment gains a validated effect_preset field (default "broadcast") with a validator that enforces membership in the available preset registry.
Effect presets listing endpoint
backend/api/routers/engines.py
GET /engines/effects/presets endpoint is added, importing list_effect_presets and returning the available presets via EffectPresetsResponse.
Generation endpoint effect preset integration
backend/api/routers/generation.py
/generate endpoint gains effect_preset form parameter (default "broadcast"); _run_inference accepts the preset, validates it, computes sample rate, bypasses DSP for "raw", otherwise applies mastering → effect chain → normalization, then passes the preset through to background execution.
Dub generation segment DSP pipeline
backend/api/routers/dub_generate.py
_gen worker accepts effect_preset; both first-attempt and OOM-retry generation paths apply preset-driven DSP (mastering, effect chain lookup, normalization) with "raw" bypass; segment loop derives seg_effect_preset and passes it to _gen; segment fingerprint is extended to include effect_preset for change tracking.
Batched TTS effect preset support
backend/services/batched_tts.py
SegmentSpec data container gains effect_preset slot; per-segment processing defaults preset to "broadcast", bypasses DSP for "raw", otherwise applies mastering → effect chain → normalization.
Incremental rebubbing fingerprint update
backend/services/incremental.py
effect_preset is added to _GEN_INPUT_FIELDS so preset changes trigger segment regeneration; segment_fingerprint docstring is updated to document the new hash input.
Frontend API contracts and client
frontend/src/api/engines.ts, frontend/src/api/types.ts
TypeScript EffectPreset and EffectPresetsResponse types are defined; fetchEffectPresets() async function calls /engines/effects/presets; new DubSegment type includes effect_preset as optional string field.
Frontend effect preset state in dub slice
frontend/src/store/dubSlice.ts
DubSlice Zustand store is extended with segmentEffectPresets (per-segment preset map) and availableEffectPresets (loaded presets list); typed setters setSegmentEffectPreset and setAvailableEffectPresets are implemented; state is initialized with empty defaults.
Effects chain DSP unit test suite
tests/test_effects_chain.py
Comprehensive pytest module validates preset listing, chain retrieval for known/raw/unknown preset IDs, audio shape preservation, smoke coverage across all presets, clipping bounds for limiter-based presets, and output differentiation between raw and processed audio.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

  • debpalash/OmniVoice-Studio#67: This PR directly implements the per-segment effect_preset feature specified in the issue, including field additions to DubSegment, DSP chain integration in multiple generation paths, and test coverage.

Possibly related PRs

  • debpalash/OmniVoice-Studio#75: Both PRs modify the _gen worker and OOM-retry logic in backend/api/routers/dub_generate.py; this PR adds effect_preset DSP processing to both first-attempt and retry paths, while the other PR modifies OOM detection and retry behavior, creating a direct coupling.

Poem

🎧 A rabbit hops through audio streams,
Effect presets in generation dreams,
From broadcast glow to raw and clean,
Mastering chains in between,
Each segment sings with style supreme! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: adding a per-segment audio effects DSP preset selector to the dub pipeline, with appropriate issue references.
Description check ✅ Passed The PR description is comprehensive and well-structured. It includes a clear summary of the feature, detailed changes, conflict resolution explanation, test plan with results, and proper attribution to the original author.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-68-fresh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/services/incremental.py (1)

23-38: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hash the canonical preset value, not the raw field.

The generation paths treat a missing effect_preset as "broadcast", but segment_fingerprint() hashes None as "". That makes omitted and explicit default presets produce identical audio while getting different fingerprints, so incremental regen will miss cache hits and regenerate unchanged segments.

💡 Suggested fix
 _GEN_INPUT_FIELDS = ("text", "target_lang", "profile_id", "instruct", "speed", "direction", "effect_preset")
@@
 def segment_fingerprint(seg: dict) -> str:
@@
-    payload = {k: (seg.get(k) if seg.get(k) is not None else "") for k in _GEN_INPUT_FIELDS}
+    payload = {}
+    for k in _GEN_INPUT_FIELDS:
+        if k == "effect_preset":
+            payload[k] = seg.get(k) or "broadcast"
+        else:
+            payload[k] = seg.get(k) if seg.get(k) is not None else ""
     blob = json.dumps(payload, sort_keys=True, ensure_ascii=False)
     return hashlib.sha1(blob.encode("utf-8")).hexdigest()[:16]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/incremental.py` around lines 23 - 38, segment_fingerprint is
hashing raw seg fields (from _GEN_INPUT_FIELDS) so a missing effect_preset
(None) gets turned into "" and produces a different fingerprint than an explicit
"broadcast" default; change segment_fingerprint to canonicalize the
effect_preset before hashing (i.e., when k == "effect_preset" map None/empty to
the generation-default "broadcast" or call the same canonicalization used in the
generation path) so omitted and explicit default presets yield the same hash.
backend/api/routers/dub_generate.py (1)

111-179: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preview audio will drift from exported audio for non-default presets.

This wires effect_preset through /dub/generate, but preview_segment() in the same file still renders with the legacy mastering-only path. A segment set to cinematic, podcast, warm, bright, or raw can sound different in preview than in the final dub. Thread the preset through the preview request and reuse the same DSP branch there.

Also applies to: 268-272

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/dub_generate.py` around lines 111 - 179, The preview path
still uses the legacy mastering-only flow, so preview_segment() must accept and
use the effect_preset and apply the exact DSP branch used in _gen: thread
effect_preset into preview_segment (and any callers in this file), compute
seg_effect_preset = effect_preset or "broadcast", return raw if
seg_effect_preset == "raw", otherwise run apply_mastering -> get_effect_chain ->
apply_effects_chain -> normalize_audio (same parameters/target_dBFS as _gen),
and preserve the existing fallback/default behaviors for missing presets or None
effect_preset.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/api/routers/generation.py`:
- Around line 51-75: The post-processing (apply_mastering, get_effect_chain,
apply_effects_chain, normalize_audio) is currently executed inside the broad OOM
translation/except block so DSP bugs surface as fake "out of memory" errors;
refactor so only the call to model.generate(...) is wrapped by the OOM-specific
try/except and re-raise/translate OOMs there, then perform effect_preset
validation and all post-processing (apply_mastering, get_effect_chain,
apply_effects_chain, normalize_audio) after that try/except so their real
exceptions propagate normally.

---

Outside diff comments:
In `@backend/api/routers/dub_generate.py`:
- Around line 111-179: The preview path still uses the legacy mastering-only
flow, so preview_segment() must accept and use the effect_preset and apply the
exact DSP branch used in _gen: thread effect_preset into preview_segment (and
any callers in this file), compute seg_effect_preset = effect_preset or
"broadcast", return raw if seg_effect_preset == "raw", otherwise run
apply_mastering -> get_effect_chain -> apply_effects_chain -> normalize_audio
(same parameters/target_dBFS as _gen), and preserve the existing
fallback/default behaviors for missing presets or None effect_preset.

In `@backend/services/incremental.py`:
- Around line 23-38: segment_fingerprint is hashing raw seg fields (from
_GEN_INPUT_FIELDS) so a missing effect_preset (None) gets turned into "" and
produces a different fingerprint than an explicit "broadcast" default; change
segment_fingerprint to canonicalize the effect_preset before hashing (i.e., when
k == "effect_preset" map None/empty to the generation-default "broadcast" or
call the same canonicalization used in the generation path) so omitted and
explicit default presets yield the same hash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c1bcbe1a-1aa0-445d-97c1-b7b790debc62

📥 Commits

Reviewing files that changed from the base of the PR and between 424ad76 and 4be4095.

📒 Files selected for processing (11)
  • backend/api/routers/dub_generate.py
  • backend/api/routers/engines.py
  • backend/api/routers/generation.py
  • backend/api/schemas.py
  • backend/schemas/requests.py
  • backend/services/batched_tts.py
  • backend/services/incremental.py
  • frontend/src/api/engines.ts
  • frontend/src/api/types.ts
  • frontend/src/store/dubSlice.ts
  • tests/test_effects_chain.py

Comment on lines +51 to 75
sr = model.sampling_rate if hasattr(model, 'sampling_rate') else 24000

# Apply DSP effect preset
_effect_preset = effect_preset or "broadcast"

# Validate preset ID
from services.audio_dsp import EFFECT_PRESETS
if _effect_preset not in EFFECT_PRESETS:
raise ValueError(
f"Unknown effect preset: {_effect_preset!r}. "
f"Valid: {list(EFFECT_PRESETS.keys())}"
)

if _effect_preset == "raw":
# Raw: skip all DSP — return raw model output
return audio_out

mastered_audio = apply_mastering(audio_out, sample_rate=sr)
_chain = get_effect_chain(_effect_preset)
if _chain:
mastered_audio = apply_effects_chain(
mastered_audio, sample_rate=sr, chain=_chain,
)

return normalize_audio(mastered_audio, target_dBFS=-2.0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep DSP failures out of the OOM wrapper.

These new post-processing calls still run inside the broad except Exception below, so a bug in the effect chain will now come back as a bogus “out of memory” error. Scope the OOM translation to model.generate(...) only, and let mastering/effects/normalize failures propagate with their real cause.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/api/routers/generation.py` around lines 51 - 75, The post-processing
(apply_mastering, get_effect_chain, apply_effects_chain, normalize_audio) is
currently executed inside the broad OOM translation/except block so DSP bugs
surface as fake "out of memory" errors; refactor so only the call to
model.generate(...) is wrapped by the OOM-specific try/except and
re-raise/translate OOMs there, then perform effect_preset validation and all
post-processing (apply_mastering, get_effect_chain, apply_effects_chain,
normalize_audio) after that try/except so their real exceptions propagate
normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants