Description
Is your feature request related to a problem? Please describe
We currently support ASCII smuggling via Unicode Tags and bit-level encoding via Sneaky Bits. However, we don’t yet support a high-density, byte-level encoding method that relies on invisible Unicode characters.
Describe the solution you'd like
Add a new variation_selector
encoding mode to AsciiSmugglerConverter
(noting that the class now handles full UTF-8 smuggling):
- Encodes any UTF-8 input at the byte level
- Uses 256 invisible Unicode variation selectors (U+FE00–U+FE0F, U+E0100–U+E01EF)
- Appends encoded selectors to a base character (e.g., emoji)
- Supports decoding
- Keeps
unicode_tags
as the default
Describe alternatives you've considered
We previously added sneaky_bits
, which encodes at the bit level using two invisible characters. While simple and effective, variation_selector
offers higher data density (1 byte per character) and enables encoding full payloads in shorter strings.
Additional context
This is based on Paul Butler’s post and shows how:
- Variation selectors encode 1 byte invisibly
- They persist through copy/paste
- Useful for simulating data smuggling, prompt injection, and text watermarking
Note
@romanlutz good to move forward from your side?
@paulinek13 would it be okay to slip this in while you're working on generalization (PR #818)? I’m also happy to wait if you’d prefer! Should be done by Sunday.