Skip to content

It provides a comprehensive list of special characters that could potentially be used as hidden watermarks in text. These characters are often invisible or difficult to notice in regular text.

License

Notifications You must be signed in to change notification settings

dawid-ai/ai-text-watermark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Comprehensive Watermark Characters Reference

This document provides a detailed breakdown of special characters that can be used for text watermarking purposes. Each category includes the actual characters, their Unicode code points, and information on how to use them.

Clean text file with all the characters: characters.txt

Zero-Width Characters and Invisible Separators

These characters have no visible width and can be inserted between visible characters without affecting appearance.

Character Name Unicode HTML Entity Description
Zero Width Space U+200B ​ Invisible space with no width
Zero Width Non-Joiner U+200C ‌ Prevents characters from joining
Zero Width Joiner U+200D ‍ Forces characters to join
Left-to-Right Mark U+200E ‎ Changes text direction to LTR
Right-to-Left Mark U+200F ‏ Changes text direction to RTL
Word Joiner U+2060 ⁠ Similar to ZWSP but doesn't break
Function Application U+2061 ⁡ Mathematical notation, invisible
Invisible Times U+2062 ⁢ Mathematical notation, invisible
Invisible Separator U+2063 ⁣ Mathematical notation, invisible
Invisible Plus U+2064 ⁤ Mathematical notation, invisible

Usage: These characters can be inserted between normal characters or words to create unique patterns. They're completely invisible but can be detected when analyzing the text code.

Various Space Characters

Unlike regular spaces, these have different widths and behaviors but appear visually similar.

Character Name Unicode HTML Entity Description
En Space U+2002   Width of letter 'N'
Em Space U+2003   Width of letter 'M'
Three-Per-Em Space U+2004   1/3 of Em width
Four-Per-Em Space U+2005   1/4 of Em width
Six-Per-Em Space U+2006   1/6 of Em width
Figure Space U+2007   Width of a digit
Punctuation Space U+2008   Width of a period
Thin Space U+2009   1/5 of Em width
Hair Space U+200A   Thinner than thin space
Medium Mathematical Space U+205F   4/18 of Em width
Narrow No-Break Space U+202F   Non-breaking narrow space
  Ideographic Space U+3000   Width of ideographic character
No-Break Space U+00A0   Regular space that doesn't break
Ogham Space Mark U+1680   Space used in Ogham script
Mongolian Vowel Separator U+180E ᠎ Used in Mongolian script

Usage: Replace normal spaces with these alternative spaces to create patterns. Each has slightly different width, which might be imperceptible visually but can be detected programmatically.

Combining Diacritical Marks

These characters combine with preceding characters and can be stacked.

Character Range Unicode Range Description
̀ ́ ̂ ̃ ̄ ̅ ̆ ̇ ̈ ̉ ̊ ̋ ̌ ̍ ̎ ̏ U+0300 - U+030F Combining diacritical marks (accents)
̐ ̑ ̒ ̓ ̔ ̕ ̖ ̗ ̘ ̙ ̚ ̛ ̜ ̝ ̞ ̟ U+0310 - U+031F More combining marks
̠ ̡ ̢ ̣ ̤ ̥ ̦ ̧ ̨ ̩ ̪ ̫ ̬ ̭ ̮ ̯ U+0320 - U+032F More combining marks
̰ ̱ ̲ ̳ ̴ ̵ ̶ ̷ ̸ ̹ ̺ ̻ ̼ ̽ ̾ ̿ U+0330 - U+033F More combining marks
͂ ͅ ͆ ͇ ͈ ͉ ͊ ͋ ͌ ͍ ͎ ͏ ͐ ͑ ͒ ͓ U+0340 - U+034F More combining marks
͔ ͕ ͖ ͗ ͘ ͙ ͚ ͛ ͜ ͝ ͞ ͟ ͠ ͡ ͢ ͣ U+0350 - U+035F More combining marks
ͤ ͥ ͦ ͧ ͨ ͩ ͪ ͫ ͬ ͭ ͮ ͯ U+0360 - U+036F More combining marks
҈ ҉ U+0488 - U+0489 Combining Cyrillic marks

Usage: These can be added to regular characters without changing their appearance much. For example, appears like 'a' but contains an invisible combining mark. They can be stacked in multiple layers.

Special Punctuation

Alternative versions of common punctuation marks.

Character(s) Name Unicode Description
• ‣ ․ ‥ … ‧ Various Dots U+2022, U+2023, U+2024, U+2025, U+2026, U+2027 Alternative bullet points and ellipses
‹ › « » Angle Quotes U+2039, U+203A, U+00AB, U+00BB Alternative quotation marks
' ' ‚ ‛ " " „ ‟ Quotation Marks U+2018-U+201F Various styles of quotation marks
‐ ‑ ‒ – — ― ⁃ Hyphens and Dashes U+2010-U+2015, U+2043 Various lengths of dashes
Fraction Slash U+2044 Different from regular slash
⁎ ⁑ ⁂ Unusual Asterisks U+204E, U+2051, U+2042 Alternative asterisk-like symbols
⁅ ⁆ Square Bracket with Quill U+2045, U+2046 Unusual brackets
⁇ ⁈ ⁉ Multiple Question/Exclamation U+2047, U+2048, U+2049 Combined punctuation
⁋ ⁌ ⁍ Paragraph Marks U+204B, U+204C, U+204D Unusual paragraph markers
Reversed Semicolon U+204F Semicolon facing opposite direction
Swung Dash U+2053 Wavy dash
Flower Punctuation Mark U+2055 Flower-shaped punctuation
Quadruple Prime U+2057 Four prime marks
⁘ ⁙ ⁚ ⁛ ⁜ ⁝ ⁞ Various Dot Punctuation U+2058-U+205E Various unusual dot arrangements

Usage: These can replace standard punctuation while looking very similar. For example, using the reversed semicolon instead of a normal one.

Special Symbols

Distinctive symbols that can be hidden in text.

Character(s) Name Unicode Description
† ‡ ※ ⁁ ⁊ ⁋ ⁏ ⁒ Various Marks U+2020, U+2021, U+203B, U+2041, U+204A, U+204B, U+204F, U+2052 Various special marks
℠ ℡ ™ ℀ ℁ ℂ ℃ ℄ ℅ ℆ Various Symbols U+2120, U+2121, U+2122, U+2100-U+2106 Service marks, telephone, trademark, etc.
⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ Fractions U+2153-U+215A Various fraction symbols
← ↑ → ↓ ↔ ↕ ↖ ↗ ↘ ↙ Arrows U+2190-U+2199 Various directional arrows
∀ ∁ ∂ ∃ ∄ ∅ ∆ ∇ ∈ ∉ ∊ ∋ Mathematical Symbols U+2200-U+220B Various mathematical symbols
≈ ≉ ≠ ≡ ≢ ≣ ≤ ≥ ≦ ≧ More Math Symbols U+2248-U+2267 Comparison and equality symbols
① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ Circled Numbers U+2460-U+2469 Numbers in circles
⏎ ⏏ ⏐ ⏑ ⏒ ⏓ ⏔ ⏕ ⏖ ⏗ Control Symbols U+23CE-U+23D7 Symbols for control characters

Usage: These symbols can be used to replace letters or words in text while maintaining a similar appearance, or can be hidden in places where they might not be noticed.

Homoglyphs (Look-Alike Characters)

Characters from other alphabets that resemble Latin letters.

Character Set Script Description
Α α Β β Ε ε Ζ ζ Η η Ι ι Κ κ Μ μ Ν ν Ο ο Ρ ρ Τ τ Χ χ Greek Look similar to Latin A, B, E, Z, H, I, K, M, N, O, P, T, X
А а В в Е е К к М м Н н О о Р р С с Т т У у Х х Cyrillic Look similar to Latin A, B, E, K, M, H, O, P, C, T, Y, X
ᴀ ʙ ᴄ ᴅ ᴇ ғ ɢ ʜ ɪ ᴊ ᴋ ʟ ᴍ ɴ ᴏ ᴘ ǫ ʀ s ᴛ ᴜ ᴠ ᴡ x ʏ ᴢ Small Caps Smaller versions of capital letters

Usage: These can replace regular Latin characters while looking almost identical. For example, using Cyrillic 'о' instead of Latin 'o'.

Variation Selectors

Characters that modify the appearance of preceding characters.

Character(s) Unicode Range Description
︀ ︁ ︂ ︃ ︄ ︅ ︆ ︇ ︈ ︉ ︊ ︋ ︌ ︍ ︎ ️ U+FE00-U+FE0F Variation selectors 1-16

Usage: These modify the appearance of the preceding character. For example, some emoji have different appearances when followed by variation selectors.

Special Hyphen Characters

Different types of hyphens with special behaviors.

Character Name Unicode HTML Entity Description
­ Soft Hyphen U+00AD ­ Only visible when breaking a word at end of line
Non-Breaking Hyphen U+2011 ‑ Hyphen that doesn't allow line breaks

Usage: These can replace normal hyphens in text while having special properties.

Special Modifier Letters

Small letters used for phonetic notation or modification.

Character Range Unicode Range Description
ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ U+02B0-U+02BF Modifier letters
ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ U+02C0-U+02CF More modifier letters
ː ˑ ˒ ˓ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ˝ ˞ ˟ U+02D0-U+02DF Various modifiers and tone marks
ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ U+02E0-U+02EB More modifier letters and tone marks

Usage: These can be used as superscript-like characters or added to text in unexpected places.

Miscellaneous Technical Symbols

Various technical symbols that could be hidden in text.

Character Range Unicode Range Description
⌐ ⌑ ⌒ ⌓ ⌔ ⌕ ⌖ ⌗ ⌘ ⌙ ⌚ ⌛ ⌜ ⌝ ⌞ ⌟ U+2310-U+231F Miscellaneous technical symbols
⌠ ⌡ ⌢ ⌣ ⌤ ⌥ ⌦ ⌧ ⌨ U+2320-U+2328 More technical symbols
〈 〉 ⦅ ⦆ U+2329, U+232A, U+2985, U+2986 Various brackets

Usage: These can be used to replace certain characters or be hidden in text where they might not be noticed.

How to Use These Characters for Watermarking

  1. Basic Pattern Watermarking:

    • Insert zero-width characters between normal characters in a specific pattern
    • Example: Inserting ZWJ after every third character
  2. Space Replacement:

    • Replace regular spaces with different Unicode spaces
    • Example: Alternating between regular spaces and hair spaces
  3. Homoglyph Substitution:

    • Replace certain letters with identical-looking characters from other scripts
    • Example: Replacing 'o' with Cyrillic 'о' (U+043E) in specific positions
  4. Combining Mark Addition:

    • Add invisible combining marks to certain characters
    • Example: Adding a combining dot below (U+0323) to vowels
  5. Invisible Sequence Patterns:

    • Add sequences of invisible characters at specific locations in text
    • Example: Adding [ZWSP, ZWJ, ZWNJ] after periods
  6. Punctuation Substitution:

    • Replace standard punctuation with alternative Unicode versions
    • Example: Using alternative quotes or dashes

How to Create These Characters in Code

JavaScript

// Using Unicode escape sequences
const zwsp = '\u200B';  // Zero Width Space
const zwj = '\u200D';   // Zero Width Joiner
const cyrillicA = '\u0430';  // Cyrillic 'а'

// Adding watermark with zero-width characters
function addWatermark(text, pattern) {
    let result = '';
    for (let i = 0; i < text.length; i++) {
        result += text[i];
        if (i % pattern.length === 0) {
            result += pattern;
        }
    }
    return result;
}

// Example: Add a pattern of invisible characters after every 5th character
const watermarkedText = addWatermark("Hello world", "\u200B\u200D\u200C");

Python

# Using Unicode escape sequences
zwsp = '\u200B'  # Zero Width Space
zwj = '\u200D'   # Zero Width Joiner
cyrillicA = '\u0430'  # Cyrillic 'а'

# Adding watermark with zero-width characters
def add_watermark(text, pattern):
    result = ''
    for i, char in enumerate(text):
        result += char
        if i % len(pattern) == 0:
            result += pattern
    return result

# Example: Add a pattern of invisible characters after every 5th character
watermarked_text = add_watermark("Hello world", "\u200B\u200D\u200C")

HTML/CSS

<!-- Using HTML entities -->
<p>This text contains a zero-width space &#8203; here.</p>
<p>This text uses a combining mark: a&#768;</p>

<!-- Using CSS to create custom invisible watermarks -->
<style>
    .watermarked::after {
        content: '\200B\200D\200C';
        display: inline;
    }
</style>
<p class="watermarked">This text has an invisible watermark after it.</p>

Detecting Watermarks

JavaScript

// Detect zero-width characters
function detectInvisibleWatermarks(text) {
    const invisibleChars = ['\u200B', '\u200C', '\u200D', '\u2060', '\u2061', '\u2062', '\u2063', '\u2064'];
    let pattern = '';
    
    for (let i = 0; i < text.length; i++) {
        if (invisibleChars.includes(text[i])) {
            pattern += text[i];
        }
    }
    
    return pattern;
}

// Detect homoglyphs
function detectHomoglyphs(text) {
    const cyrillicMap = {
        '\u0430': 'a', '\u0435': 'e', '\u043E': 'o', 
        '\u0440': 'p', '\u0441': 'c', '\u0445': 'x'
    };
    
    let found = [];
    for (let i = 0; i < text.length; i++) {
        if (text[i] in cyrillicMap) {
            found.push({pos: i, char: text[i], latinEquiv: cyrillicMap[text[i]]});
        }
    }
    
    return found;
}

Python

# Detect zero-width characters
def detect_invisible_watermarks(text):
    invisible_chars = ['\u200B', '\u200C', '\u200D', '\u2060', '\u2061', '\u2062', '\u2063', '\u2064']
    pattern = ''
    
    for char in text:
        if char in invisible_chars:
            pattern += char
    
    return pattern

# Detect homoglyphs
def detect_homoglyphs(text):
    cyrillic_map = {
        '\u0430': 'a', '\u0435': 'e', '\u043E': 'o', 
        '\u0440': 'p', '\u0441': 'c', '\u0445': 'x'
    }
    
    found = []
    for i, char in enumerate(text):
        if char in cyrillic_map:
            found.append({'pos': i, 'char': char, 'latin_equiv': cyrillic_map[char]})
    
    return found

Conclusion

Text watermarking with Unicode characters provides a wide range of techniques for invisibly marking text. The most effective watermarks typically use a combination of these techniques to create unique, detectable patterns while maintaining the visible appearance of the text.

About

It provides a comprehensive list of special characters that could potentially be used as hidden watermarks in text. These characters are often invisible or difficult to notice in regular text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published