feat: Add emoji filter before TTS by manas-narra · Pull Request #788 · juspay/clairvoyance

manas-narra · 2026-05-26T06:41:32Z

Added emoji filter betwen LLM and TTS to filter emojis being passed in the transcript to LLM.

Summary by CodeRabbit

New Features
- Added automatic emoji filtering to text-to-speech services. Text input is now sanitized to remove emoji characters before audio synthesis, improving voice output quality across all supported voice providers.

- Added emoji filter betwen LLM and TTS to filter emojis being passed in the transcript to LLM.

coderabbitai · 2026-05-26T06:41:47Z

Walkthrough

This PR adds emoji text filtering to the text-to-speech pipeline by introducing a new emoji filter module with regex-based character removal, extending Cartesia and Sarvam TTS configs to support configurable text filters, and integrating emoji stripping into Breeze Buddy's TTS generation across all four supported providers.

Changes

Emoji Filtering for TTS Synthesis

Layer / File(s)	Summary
Emoji filter implementation `app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py`	Introduces a precompiled Unicode regex pattern to match emoji and pictographic characters. Implements `strip_emoji(text)` to remove matched patterns and normalize whitespace, and provides `EmojiTextFilter` as a `BaseTextFilter` subclass with async `filter()` method for TTS text preprocessing.
TTS provider config schema for text filters `app/ai/voice/tts/cartesia.py`, `app/ai/voice/tts/sarvam.py`	`CartesiaConfig` and `SarvamTTSConfig` each add optional `text_filters: Optional[Sequence]` field. Their builder functions (`build_cartesia_tts`, `build_sarvam_tts`) wire this field into the underlying TTS service settings by converting to list or passing `None`.
Breeze Buddy TTS emoji filtering integration `app/ai/voice/agents/breeze_buddy/tts/__init__.py`	Imports `EmojiTextFilter` and applies it to all four TTS providers (elevenlabs, cartesia, sarvam, gemini) via `text_filters=[EmojiTextFilter()]`. Additionally, `generate_audio()` strips emojis upfront via `strip_emoji(text)` before provider selection to ensure consistent removal regardless of provider-specific filtering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hops of joy! The emojis must go,
When speaking out loud, they steal the show.
Filter it quick with regex so keen,
From all TTS voices, now clean and serene!
✨🎤 thump-thump

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Add emoji filter before TTS' clearly and concisely summarizes the main change: adding an emoji filtering capability to the TTS pipeline.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds an emoji-stripping text filter path for Breeze Buddy TTS so that emoji/pictographic characters don’t reach TTS providers (which can cause spoken emoji names or garbled audio).

Changes:

Added text_filters support to Sarvam and Cartesia TTS config/builders.
Introduced EmojiTextFilter + strip_emoji() utility and wired it into Breeze Buddy’s TTS service builders.
Added a pre-provider strip_emoji() pass in generate_audio() for non-pipecat direct API synthesis paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
app/ai/voice/tts/sarvam.py	Adds `text_filters` to Sarvam TTS config and forwards to `SarvamTTSService`.
app/ai/voice/tts/cartesia.py	Adds `text_filters` to Cartesia TTS config and forwards to `CartesiaTTSService`.
app/ai/voice/agents/breeze_buddy/tts/init.py	Wires emoji filtering into Breeze Buddy TTS service creation and direct audio generation.
app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py	Implements emoji stripping utility and a pipecat `BaseTextFilter`.

+def strip_emoji(text: str) -> str:
+    """Remove all emoji characters from *text* and collapse extra whitespace."""
+    return _EMOJI_PATTERN.sub("", text).strip()


coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

app/ai/voice/tts/cartesia.py (1)
40-40: ⚡ Quick win

Use a concrete element type for text_filters.

Optional[Sequence] loses type-safety. Please type this as Optional[Sequence[...]] (e.g., BaseTextFilter) so static checks can validate filter objects.

As per coding guidelines, "Add type hints on all function signatures" and "Use Optional[T], List[T], Dict[str, Any], Union for type hints."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/tts/cartesia.py` at line 40, The parameter/attribute declaration
text_filters: Optional[Sequence] = None should be made concrete by specifying
the element type (e.g., text_filters: Optional[Sequence[BaseTextFilter]] = None)
so static typing can validate filter objects; update the annotation in
cartesia.py where text_filters is declared (and add or import the appropriate
BaseTextFilter type/class into that module) and run type checks to ensure all
usages of text_filters accept the specific filter type instead of an untyped
Sequence.
app/ai/voice/tts/sarvam.py (1)
39-39: ⚡ Quick win

Add the element type to text_filters annotation.

Optional[Sequence] is too loose for static validation. Prefer Optional[Sequence[...]] (matching the filter base type used by TTS services).

As per coding guidelines, "Use Optional[T], List[T], Dict[str, Any], Union for type hints."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/tts/sarvam.py` at line 39, The annotation for text_filters is
too broad; change text_filters: Optional[Sequence] = None to specify the element
type used by TTS filters (e.g., text_filters: Optional[Sequence[TextFilter]] =
None), import the TextFilter (or the actual filter base type used by your TTS
services) and keep Optional/Sequence from typing so static validators can check
item types; update any references to match the concrete filter base name.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py`:
- Around line 34-36: strip_emoji currently removes emoji via _EMOJI_PATTERN but
only calls .strip(), so internal multiple whitespace left by removals isn't
collapsed; update strip_emoji to replace any run of whitespace (e.g., using a
regex like r"\s+") with a single space after removing emojis and then .strip()
so internal spaces, tabs, and newlines collapse to single spaces—modify the body
of strip_emoji (the function named strip_emoji and use of _EMOJI_PATTERN)
accordingly.

---

Nitpick comments:
In `@app/ai/voice/tts/cartesia.py`:
- Line 40: The parameter/attribute declaration text_filters: Optional[Sequence]
= None should be made concrete by specifying the element type (e.g.,
text_filters: Optional[Sequence[BaseTextFilter]] = None) so static typing can
validate filter objects; update the annotation in cartesia.py where text_filters
is declared (and add or import the appropriate BaseTextFilter type/class into
that module) and run type checks to ensure all usages of text_filters accept the
specific filter type instead of an untyped Sequence.

In `@app/ai/voice/tts/sarvam.py`:
- Line 39: The annotation for text_filters is too broad; change text_filters:
Optional[Sequence] = None to specify the element type used by TTS filters (e.g.,
text_filters: Optional[Sequence[TextFilter]] = None), import the TextFilter (or
the actual filter base type used by your TTS services) and keep
Optional/Sequence from typing so static validators can check item types; update
any references to match the concrete filter base name.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 52c3853e-a7a8-4b92-adf2-b2caba7673a7

📥 Commits

Reviewing files that changed from the base of the PR and between 9fbe4fb and 3482a0f.

📒 Files selected for processing (4)

app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py
app/ai/voice/agents/breeze_buddy/tts/__init__.py
app/ai/voice/tts/cartesia.py
app/ai/voice/tts/sarvam.py

coderabbitai · 2026-05-26T06:47:22Z

+def strip_emoji(text: str) -> str:
+    """Remove all emoji characters from *text* and collapse extra whitespace."""
+    return _EMOJI_PATTERN.sub("", text).strip()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

strip_emoji does not currently collapse internal whitespace.

Line 35 promises whitespace normalization, but Line 36 only strips ends. This can leave doubled spaces after emoji removal in spoken text.

💡 Suggested fix

def strip_emoji(text: str) -> str: """Remove all emoji characters from *text* and collapse extra whitespace.""" - return _EMOJI_PATTERN.sub("", text).strip() + text_without_emoji = _EMOJI_PATTERN.sub("", text) + return re.sub(r"\s+", " ", text_without_emoji).strip()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py` around lines 34 - 36, strip_emoji currently removes emoji via _EMOJI_PATTERN but only calls .strip(), so internal multiple whitespace left by removals isn't collapsed; update strip_emoji to replace any run of whitespace (e.g., using a regex like r"\s+") with a single space after removing emojis and then .strip() so internal spaces, tabs, and newlines collapse to single spaces—modify the body of strip_emoji (the function named strip_emoji and use of _EMOJI_PATTERN) accordingly.

feat: Add emoji filter before TTS

3482a0f

- Added emoji filter betwen LLM and TTS to filter emojis being passed in the transcript to LLM.

Copilot AI review requested due to automatic review settings May 26, 2026 06:41

Copilot started reviewing on behalf of manas-narra May 26, 2026 06:41 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread app/ai/voice/agents/breeze_buddy/processors/emoji_text_filter.py

Comment on lines +34 to +36

def strip_emoji(text: str) -> str:

"""Remove all emoji characters from *text* and collapse extra whitespace."""

return _EMOJI_PATTERN.sub("", text).strip()

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add emoji filter before TTS#788

feat: Add emoji filter before TTS#788
manas-narra wants to merge 1 commit into
juspay:releasefrom
manas-narra:emoji-filter

manas-narra commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

manas-narra commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manas-narra commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading