Is your feature request related to a problem?
The current implementation doesn't properly handle:
- Non-Latin scripts (Arabic, Devanagari, CJK, etc.)
- Internationalized Domain Names (IDNs) like भारत.भारत
- Email Address Internationalization (EAI) like user@عربي.السعودية
- Unicode normalization (NFC/NFD/NFKC/NFKD)
- Bidirectional text (RTL/LTR mixing)
- Zero-width characters (ZWNJ/ZWJ) needed for Indic scripts
Describe the solution you'd like
Full Unicode and Universal Acceptance compliance:
# Should work correctly
guardrail.validate("प्रयोक्ता@भारत.भारत") # EAI email
guardrail.validate("https://भारत.भारत") # IDN domain
guardrail.validate("مرحبا שלום") # Mixed RTL scripts
guardrail.validate("हिन्दी") # ZWNJ in Hindi
Technical Requirements:
- Unicode 15.0 support
- Grapheme cluster counting (not byte counting)
- Confusable character detection (homograph attacks)
- Zero-width character handling for Indic scripts
- IDN/EAI validation per RFC 5891/6531
- Proper normalization across all forms
Additional context
- Required for global internet standards compliance
- Critical for preventing Unicode-based security attacks (homographs)
- Necessary for Indian language support (22 official languages)
- Enables proper handling of emoji and complex scripts
References:
Implementation Impact
This would make any-guardrail compliant with international standards and usable globally.
Is your feature request related to a problem?
The current implementation doesn't properly handle:
Describe the solution you'd like
Full Unicode and Universal Acceptance compliance:
Technical Requirements:
Additional context
References:
Implementation Impact
This would make any-guardrail compliant with international standards and usable globally.