Skip to content

FEAT: Smuggling arbitrary data through an emoji #835

Open
@KutalVolkan

Description

@KutalVolkan

Is your feature request related to a problem? Please describe

We currently support ASCII smuggling via Unicode Tags and bit-level encoding via Sneaky Bits. However, we don’t yet support a high-density, byte-level encoding method that relies on invisible Unicode characters.

Describe the solution you'd like

Add a new variation_selector encoding mode to AsciiSmugglerConverter (noting that the class now handles full UTF-8 smuggling):

  • Encodes any UTF-8 input at the byte level
  • Uses 256 invisible Unicode variation selectors (U+FE00–U+FE0F, U+E0100–U+E01EF)
  • Appends encoded selectors to a base character (e.g., emoji)
  • Supports decoding
  • Keeps unicode_tags as the default

Describe alternatives you've considered

We previously added sneaky_bits, which encodes at the bit level using two invisible characters. While simple and effective, variation_selector offers higher data density (1 byte per character) and enables encoding full payloads in shorter strings.

Additional context

This is based on Paul Butler’s post and shows how:

  • Variation selectors encode 1 byte invisibly
  • They persist through copy/paste
  • Useful for simulating data smuggling, prompt injection, and text watermarking

Note

@romanlutz good to move forward from your side?
@paulinek13 would it be okay to slip this in while you're working on generalization (PR #818)? I’m also happy to wait if you’d prefer! Should be done by Sunday.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions