Skip to content

feat: Add emoji filter before TTS#788

Open
manas-narra wants to merge 1 commit into
juspay:releasefrom
manas-narra:emoji-filter
Open

feat: Add emoji filter before TTS#788
manas-narra wants to merge 1 commit into
juspay:releasefrom
manas-narra:emoji-filter

Conversation

@manas-narra

@manas-narra manas-narra commented May 26, 2026

Copy link
Copy Markdown
Collaborator
  • Added emoji filter betwen LLM and TTS to filter emojis being passed in the transcript to LLM.

Summary by CodeRabbit

  • New Features
    • Added automatic emoji filtering to text-to-speech services. Text input is now sanitized to remove emoji characters before audio synthesis, improving voice output quality across all supported voice providers.

Review Change Stack

  - Added emoji filter betwen LLM and TTS to filter emojis being passed in the transcript to LLM.
Copilot AI review requested due to automatic review settings May 26, 2026 06:41
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Walkthrough

This PR adds emoji text filtering to the text-to-speech pipeline by introducing a new emoji filter module with regex-based character removal, extending Cartesia and Sarvam TTS configs to support configurable text filters, and integrating emoji stripping into Breeze Buddy's TTS generation across all four supported providers.

Changes

Emoji Filtering for TTS Synthesis

Layer / File(s) Summary
Emoji filter implementation
app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py
Introduces a precompiled Unicode regex pattern to match emoji and pictographic characters. Implements strip_emoji(text) to remove matched patterns and normalize whitespace, and provides EmojiTextFilter as a BaseTextFilter subclass with async filter() method for TTS text preprocessing.
TTS provider config schema for text filters
app/ai/voice/tts/cartesia.py, app/ai/voice/tts/sarvam.py
CartesiaConfig and SarvamTTSConfig each add optional text_filters: Optional[Sequence] field. Their builder functions (build_cartesia_tts, build_sarvam_tts) wire this field into the underlying TTS service settings by converting to list or passing None.
Breeze Buddy TTS emoji filtering integration
app/ai/voice/agents/breeze_buddy/tts/__init__.py
Imports EmojiTextFilter and applies it to all four TTS providers (elevenlabs, cartesia, sarvam, gemini) via text_filters=[EmojiTextFilter()]. Additionally, generate_audio() strips emojis upfront via strip_emoji(text) before provider selection to ensure consistent removal regardless of provider-specific filtering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hops of joy! The emojis must go,
When speaking out loud, they steal the show.
Filter it quick with regex so keen,
From all TTS voices, now clean and serene!
✨🎤 thump-thump

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add emoji filter before TTS' clearly and concisely summarizes the main change: adding an emoji filtering capability to the TTS pipeline.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an emoji-stripping text filter path for Breeze Buddy TTS so that emoji/pictographic characters don’t reach TTS providers (which can cause spoken emoji names or garbled audio).

Changes:

  • Added text_filters support to Sarvam and Cartesia TTS config/builders.
  • Introduced EmojiTextFilter + strip_emoji() utility and wired it into Breeze Buddy’s TTS service builders.
  • Added a pre-provider strip_emoji() pass in generate_audio() for non-pipecat direct API synthesis paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
app/ai/voice/tts/sarvam.py Adds text_filters to Sarvam TTS config and forwards to SarvamTTSService.
app/ai/voice/tts/cartesia.py Adds text_filters to Cartesia TTS config and forwards to CartesiaTTSService.
app/ai/voice/agents/breeze_buddy/tts/init.py Wires emoji filtering into Breeze Buddy TTS service creation and direct audio generation.
app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py Implements emoji stripping utility and a pipecat BaseTextFilter.

Comment on lines +34 to +36
def strip_emoji(text: str) -> str:
"""Remove all emoji characters from *text* and collapse extra whitespace."""
return _EMOJI_PATTERN.sub("", text).strip()

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
app/ai/voice/tts/cartesia.py (1)

40-40: ⚡ Quick win

Use a concrete element type for text_filters.

Optional[Sequence] loses type-safety. Please type this as Optional[Sequence[...]] (e.g., BaseTextFilter) so static checks can validate filter objects.

As per coding guidelines, "Add type hints on all function signatures" and "Use Optional[T], List[T], Dict[str, Any], Union for type hints."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/tts/cartesia.py` at line 40, The parameter/attribute declaration
text_filters: Optional[Sequence] = None should be made concrete by specifying
the element type (e.g., text_filters: Optional[Sequence[BaseTextFilter]] = None)
so static typing can validate filter objects; update the annotation in
cartesia.py where text_filters is declared (and add or import the appropriate
BaseTextFilter type/class into that module) and run type checks to ensure all
usages of text_filters accept the specific filter type instead of an untyped
Sequence.
app/ai/voice/tts/sarvam.py (1)

39-39: ⚡ Quick win

Add the element type to text_filters annotation.

Optional[Sequence] is too loose for static validation. Prefer Optional[Sequence[...]] (matching the filter base type used by TTS services).

As per coding guidelines, "Use Optional[T], List[T], Dict[str, Any], Union for type hints."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/tts/sarvam.py` at line 39, The annotation for text_filters is
too broad; change text_filters: Optional[Sequence] = None to specify the element
type used by TTS filters (e.g., text_filters: Optional[Sequence[TextFilter]] =
None), import the TextFilter (or the actual filter base type used by your TTS
services) and keep Optional/Sequence from typing so static validators can check
item types; update any references to match the concrete filter base name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py`:
- Around line 34-36: strip_emoji currently removes emoji via _EMOJI_PATTERN but
only calls .strip(), so internal multiple whitespace left by removals isn't
collapsed; update strip_emoji to replace any run of whitespace (e.g., using a
regex like r"\s+") with a single space after removing emojis and then .strip()
so internal spaces, tabs, and newlines collapse to single spaces—modify the body
of strip_emoji (the function named strip_emoji and use of _EMOJI_PATTERN)
accordingly.

---

Nitpick comments:
In `@app/ai/voice/tts/cartesia.py`:
- Line 40: The parameter/attribute declaration text_filters: Optional[Sequence]
= None should be made concrete by specifying the element type (e.g.,
text_filters: Optional[Sequence[BaseTextFilter]] = None) so static typing can
validate filter objects; update the annotation in cartesia.py where text_filters
is declared (and add or import the appropriate BaseTextFilter type/class into
that module) and run type checks to ensure all usages of text_filters accept the
specific filter type instead of an untyped Sequence.

In `@app/ai/voice/tts/sarvam.py`:
- Line 39: The annotation for text_filters is too broad; change text_filters:
Optional[Sequence] = None to specify the element type used by TTS filters (e.g.,
text_filters: Optional[Sequence[TextFilter]] = None), import the TextFilter (or
the actual filter base type used by your TTS services) and keep
Optional/Sequence from typing so static validators can check item types; update
any references to match the concrete filter base name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52c3853e-a7a8-4b92-adf2-b2caba7673a7

📥 Commits

Reviewing files that changed from the base of the PR and between 9fbe4fb and 3482a0f.

📒 Files selected for processing (4)
  • app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py
  • app/ai/voice/agents/breeze_buddy/tts/__init__.py
  • app/ai/voice/tts/cartesia.py
  • app/ai/voice/tts/sarvam.py

Comment on lines +34 to +36
def strip_emoji(text: str) -> str:
"""Remove all emoji characters from *text* and collapse extra whitespace."""
return _EMOJI_PATTERN.sub("", text).strip()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

strip_emoji does not currently collapse internal whitespace.

Line 35 promises whitespace normalization, but Line 36 only strips ends. This can leave doubled spaces after emoji removal in spoken text.

💡 Suggested fix
 def strip_emoji(text: str) -> str:
     """Remove all emoji characters from *text* and collapse extra whitespace."""
-    return _EMOJI_PATTERN.sub("", text).strip()
+    text_without_emoji = _EMOJI_PATTERN.sub("", text)
+    return re.sub(r"\s+", " ", text_without_emoji).strip()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py` around
lines 34 - 36, strip_emoji currently removes emoji via _EMOJI_PATTERN but only
calls .strip(), so internal multiple whitespace left by removals isn't
collapsed; update strip_emoji to replace any run of whitespace (e.g., using a
regex like r"\s+") with a single space after removing emojis and then .strip()
so internal spaces, tabs, and newlines collapse to single spaces—modify the body
of strip_emoji (the function named strip_emoji and use of _EMOJI_PATTERN)
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants