This document provides a detailed breakdown of special characters that can be used for text watermarking purposes. Each category includes the actual characters, their Unicode code points, and information on how to use them.
Clean text file with all the characters: characters.txt
These characters have no visible width and can be inserted between visible characters without affecting appearance.
Character | Name | Unicode | HTML Entity | Description |
---|---|---|---|---|
|
Zero Width Space | U+200B | ​ |
Invisible space with no width |
|
Zero Width Non-Joiner | U+200C | ‌ |
Prevents characters from joining |
|
Zero Width Joiner | U+200D | ‍ |
Forces characters to join |
|
Left-to-Right Mark | U+200E | ‎ |
Changes text direction to LTR |
|
Right-to-Left Mark | U+200F | ‏ |
Changes text direction to RTL |
|
Word Joiner | U+2060 | ⁠ |
Similar to ZWSP but doesn't break |
|
Function Application | U+2061 | ⁡ |
Mathematical notation, invisible |
|
Invisible Times | U+2062 | ⁢ |
Mathematical notation, invisible |
|
Invisible Separator | U+2063 | ⁣ |
Mathematical notation, invisible |
|
Invisible Plus | U+2064 | ⁤ |
Mathematical notation, invisible |
Usage: These characters can be inserted between normal characters or words to create unique patterns. They're completely invisible but can be detected when analyzing the text code.
Unlike regular spaces, these have different widths and behaviors but appear visually similar.
Character | Name | Unicode | HTML Entity | Description |
---|---|---|---|---|
|
En Space | U+2002 |   |
Width of letter 'N' |
|
Em Space | U+2003 |   |
Width of letter 'M' |
|
Three-Per-Em Space | U+2004 |   |
1/3 of Em width |
|
Four-Per-Em Space | U+2005 |   |
1/4 of Em width |
|
Six-Per-Em Space | U+2006 |   |
1/6 of Em width |
|
Figure Space | U+2007 |   |
Width of a digit |
|
Punctuation Space | U+2008 |   |
Width of a period |
|
Thin Space | U+2009 |   |
1/5 of Em width |
|
Hair Space | U+200A |   |
Thinner than thin space |
|
Medium Mathematical Space | U+205F |   |
4/18 of Em width |
|
Narrow No-Break Space | U+202F |   |
Non-breaking narrow space |
|
Ideographic Space | U+3000 |   |
Width of ideographic character |
|
No-Break Space | U+00A0 | |
Regular space that doesn't break |
|
Ogham Space Mark | U+1680 |   |
Space used in Ogham script |
|
Mongolian Vowel Separator | U+180E | ᠎ |
Used in Mongolian script |
Usage: Replace normal spaces with these alternative spaces to create patterns. Each has slightly different width, which might be imperceptible visually but can be detected programmatically.
These characters combine with preceding characters and can be stacked.
Character Range | Unicode Range | Description |
---|---|---|
̀ ́ ̂ ̃ ̄ ̅ ̆ ̇ ̈ ̉ ̊ ̋ ̌ ̍ ̎ ̏ |
U+0300 - U+030F | Combining diacritical marks (accents) |
̐ ̑ ̒ ̓ ̔ ̕ ̖ ̗ ̘ ̙ ̚ ̛ ̜ ̝ ̞ ̟ |
U+0310 - U+031F | More combining marks |
̠ ̡ ̢ ̣ ̤ ̥ ̦ ̧ ̨ ̩ ̪ ̫ ̬ ̭ ̮ ̯ |
U+0320 - U+032F | More combining marks |
̰ ̱ ̲ ̳ ̴ ̵ ̶ ̷ ̸ ̹ ̺ ̻ ̼ ̽ ̾ ̿ |
U+0330 - U+033F | More combining marks |
͂ ͅ ͆ ͇ ͈ ͉ ͊ ͋ ͌ ͍ ͎ ͏ ͐ ͑ ͒ ͓ |
U+0340 - U+034F | More combining marks |
͔ ͕ ͖ ͗ ͘ ͙ ͚ ͛ ͜ ͝ ͞ ͟ ͠ ͡ ͢ ͣ |
U+0350 - U+035F | More combining marks |
ͤ ͥ ͦ ͧ ͨ ͩ ͪ ͫ ͬ ͭ ͮ ͯ |
U+0360 - U+036F | More combining marks |
҈ ҉ |
U+0488 - U+0489 | Combining Cyrillic marks |
Usage: These can be added to regular characters without changing their appearance much. For example, a̸
appears like 'a' but contains an invisible combining mark. They can be stacked in multiple layers.
Alternative versions of common punctuation marks.
Character(s) | Name | Unicode | Description |
---|---|---|---|
• ‣ ․ ‥ … ‧ |
Various Dots | U+2022, U+2023, U+2024, U+2025, U+2026, U+2027 | Alternative bullet points and ellipses |
‹ › « » |
Angle Quotes | U+2039, U+203A, U+00AB, U+00BB | Alternative quotation marks |
' ' ‚ ‛ " " „ ‟ |
Quotation Marks | U+2018-U+201F | Various styles of quotation marks |
‐ ‑ ‒ – — ― ⁃ |
Hyphens and Dashes | U+2010-U+2015, U+2043 | Various lengths of dashes |
⁄ |
Fraction Slash | U+2044 | Different from regular slash |
⁎ ⁑ ⁂ |
Unusual Asterisks | U+204E, U+2051, U+2042 | Alternative asterisk-like symbols |
⁅ ⁆ |
Square Bracket with Quill | U+2045, U+2046 | Unusual brackets |
⁇ ⁈ ⁉ |
Multiple Question/Exclamation | U+2047, U+2048, U+2049 | Combined punctuation |
⁋ ⁌ ⁍ |
Paragraph Marks | U+204B, U+204C, U+204D | Unusual paragraph markers |
⁏ |
Reversed Semicolon | U+204F | Semicolon facing opposite direction |
⁓ |
Swung Dash | U+2053 | Wavy dash |
⁕ |
Flower Punctuation Mark | U+2055 | Flower-shaped punctuation |
⁗ |
Quadruple Prime | U+2057 | Four prime marks |
⁘ ⁙ ⁚ ⁛ ⁜ ⁝ ⁞ |
Various Dot Punctuation | U+2058-U+205E | Various unusual dot arrangements |
Usage: These can replace standard punctuation while looking very similar. For example, using the reversed semicolon instead of a normal one.
Distinctive symbols that can be hidden in text.
Character(s) | Name | Unicode | Description |
---|---|---|---|
† ‡ ※ ⁁ ⁊ ⁋ ⁏ ⁒ |
Various Marks | U+2020, U+2021, U+203B, U+2041, U+204A, U+204B, U+204F, U+2052 | Various special marks |
℠ ℡ ™ ℀ ℁ ℂ ℃ ℄ ℅ ℆ |
Various Symbols | U+2120, U+2121, U+2122, U+2100-U+2106 | Service marks, telephone, trademark, etc. |
⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ |
Fractions | U+2153-U+215A | Various fraction symbols |
← ↑ → ↓ ↔ ↕ ↖ ↗ ↘ ↙ |
Arrows | U+2190-U+2199 | Various directional arrows |
∀ ∁ ∂ ∃ ∄ ∅ ∆ ∇ ∈ ∉ ∊ ∋ |
Mathematical Symbols | U+2200-U+220B | Various mathematical symbols |
≈ ≉ ≠ ≡ ≢ ≣ ≤ ≥ ≦ ≧ |
More Math Symbols | U+2248-U+2267 | Comparison and equality symbols |
① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ |
Circled Numbers | U+2460-U+2469 | Numbers in circles |
⏎ ⏏ ⏐ ⏑ ⏒ ⏓ ⏔ ⏕ ⏖ ⏗ |
Control Symbols | U+23CE-U+23D7 | Symbols for control characters |
Usage: These symbols can be used to replace letters or words in text while maintaining a similar appearance, or can be hidden in places where they might not be noticed.
Characters from other alphabets that resemble Latin letters.
Character Set | Script | Description |
---|---|---|
Α α Β β Ε ε Ζ ζ Η η Ι ι Κ κ Μ μ Ν ν Ο ο Ρ ρ Τ τ Χ χ |
Greek | Look similar to Latin A, B, E, Z, H, I, K, M, N, O, P, T, X |
А а В в Е е К к М м Н н О о Р р С с Т т У у Х х |
Cyrillic | Look similar to Latin A, B, E, K, M, H, O, P, C, T, Y, X |
ᴀ ʙ ᴄ ᴅ ᴇ ғ ɢ ʜ ɪ ᴊ ᴋ ʟ ᴍ ɴ ᴏ ᴘ ǫ ʀ s ᴛ ᴜ ᴠ ᴡ x ʏ ᴢ |
Small Caps | Smaller versions of capital letters |
Usage: These can replace regular Latin characters while looking almost identical. For example, using Cyrillic 'о' instead of Latin 'o'.
Characters that modify the appearance of preceding characters.
Character(s) | Unicode Range | Description |
---|---|---|
︀ ︁ ︂ ︃ ︄ ︅ ︆ ︇ ︈ ︉ ︊ ︋ ︌ ︍ ︎ ️ |
U+FE00-U+FE0F | Variation selectors 1-16 |
Usage: These modify the appearance of the preceding character. For example, some emoji have different appearances when followed by variation selectors.
Different types of hyphens with special behaviors.
Character | Name | Unicode | HTML Entity | Description |
---|---|---|---|---|
|
Soft Hyphen | U+00AD | ­ |
Only visible when breaking a word at end of line |
‑ |
Non-Breaking Hyphen | U+2011 | ‑ |
Hyphen that doesn't allow line breaks |
Usage: These can replace normal hyphens in text while having special properties.
Small letters used for phonetic notation or modification.
Character Range | Unicode Range | Description |
---|---|---|
ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ |
U+02B0-U+02BF | Modifier letters |
ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ |
U+02C0-U+02CF | More modifier letters |
ː ˑ ˒ ˓ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ˝ ˞ ˟ |
U+02D0-U+02DF | Various modifiers and tone marks |
ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ |
U+02E0-U+02EB | More modifier letters and tone marks |
Usage: These can be used as superscript-like characters or added to text in unexpected places.
Various technical symbols that could be hidden in text.
Character Range | Unicode Range | Description |
---|---|---|
⌐ ⌑ ⌒ ⌓ ⌔ ⌕ ⌖ ⌗ ⌘ ⌙ ⌚ ⌛ ⌜ ⌝ ⌞ ⌟ |
U+2310-U+231F | Miscellaneous technical symbols |
⌠ ⌡ ⌢ ⌣ ⌤ ⌥ ⌦ ⌧ ⌨ |
U+2320-U+2328 | More technical symbols |
〈 〉 ⦅ ⦆ |
U+2329, U+232A, U+2985, U+2986 | Various brackets |
Usage: These can be used to replace certain characters or be hidden in text where they might not be noticed.
-
Basic Pattern Watermarking:
- Insert zero-width characters between normal characters in a specific pattern
- Example: Inserting ZWJ after every third character
-
Space Replacement:
- Replace regular spaces with different Unicode spaces
- Example: Alternating between regular spaces and hair spaces
-
Homoglyph Substitution:
- Replace certain letters with identical-looking characters from other scripts
- Example: Replacing 'o' with Cyrillic 'о' (U+043E) in specific positions
-
Combining Mark Addition:
- Add invisible combining marks to certain characters
- Example: Adding a combining dot below (U+0323) to vowels
-
Invisible Sequence Patterns:
- Add sequences of invisible characters at specific locations in text
- Example: Adding [ZWSP, ZWJ, ZWNJ] after periods
-
Punctuation Substitution:
- Replace standard punctuation with alternative Unicode versions
- Example: Using alternative quotes or dashes
// Using Unicode escape sequences
const zwsp = '\u200B'; // Zero Width Space
const zwj = '\u200D'; // Zero Width Joiner
const cyrillicA = '\u0430'; // Cyrillic 'а'
// Adding watermark with zero-width characters
function addWatermark(text, pattern) {
let result = '';
for (let i = 0; i < text.length; i++) {
result += text[i];
if (i % pattern.length === 0) {
result += pattern;
}
}
return result;
}
// Example: Add a pattern of invisible characters after every 5th character
const watermarkedText = addWatermark("Hello world", "\u200B\u200D\u200C");
# Using Unicode escape sequences
zwsp = '\u200B' # Zero Width Space
zwj = '\u200D' # Zero Width Joiner
cyrillicA = '\u0430' # Cyrillic 'а'
# Adding watermark with zero-width characters
def add_watermark(text, pattern):
result = ''
for i, char in enumerate(text):
result += char
if i % len(pattern) == 0:
result += pattern
return result
# Example: Add a pattern of invisible characters after every 5th character
watermarked_text = add_watermark("Hello world", "\u200B\u200D\u200C")
<!-- Using HTML entities -->
<p>This text contains a zero-width space ​ here.</p>
<p>This text uses a combining mark: à</p>
<!-- Using CSS to create custom invisible watermarks -->
<style>
.watermarked::after {
content: '\200B\200D\200C';
display: inline;
}
</style>
<p class="watermarked">This text has an invisible watermark after it.</p>
// Detect zero-width characters
function detectInvisibleWatermarks(text) {
const invisibleChars = ['\u200B', '\u200C', '\u200D', '\u2060', '\u2061', '\u2062', '\u2063', '\u2064'];
let pattern = '';
for (let i = 0; i < text.length; i++) {
if (invisibleChars.includes(text[i])) {
pattern += text[i];
}
}
return pattern;
}
// Detect homoglyphs
function detectHomoglyphs(text) {
const cyrillicMap = {
'\u0430': 'a', '\u0435': 'e', '\u043E': 'o',
'\u0440': 'p', '\u0441': 'c', '\u0445': 'x'
};
let found = [];
for (let i = 0; i < text.length; i++) {
if (text[i] in cyrillicMap) {
found.push({pos: i, char: text[i], latinEquiv: cyrillicMap[text[i]]});
}
}
return found;
}
# Detect zero-width characters
def detect_invisible_watermarks(text):
invisible_chars = ['\u200B', '\u200C', '\u200D', '\u2060', '\u2061', '\u2062', '\u2063', '\u2064']
pattern = ''
for char in text:
if char in invisible_chars:
pattern += char
return pattern
# Detect homoglyphs
def detect_homoglyphs(text):
cyrillic_map = {
'\u0430': 'a', '\u0435': 'e', '\u043E': 'o',
'\u0440': 'p', '\u0441': 'c', '\u0445': 'x'
}
found = []
for i, char in enumerate(text):
if char in cyrillic_map:
found.append({'pos': i, 'char': char, 'latin_equiv': cyrillic_map[char]})
return found
Text watermarking with Unicode characters provides a wide range of techniques for invisibly marking text. The most effective watermarks typically use a combination of these techniques to create unique, detectable patterns while maintaining the visible appearance of the text.