Skip to content

[Feature] Add Unicode 15.0 and Universal Acceptance (UA) Compliance #107

@anivar

Description

@anivar

Is your feature request related to a problem?

The current implementation doesn't properly handle:

  • Non-Latin scripts (Arabic, Devanagari, CJK, etc.)
  • Internationalized Domain Names (IDNs) like भारत.भारत
  • Email Address Internationalization (EAI) like user@عربي.السعودية
  • Unicode normalization (NFC/NFD/NFKC/NFKD)
  • Bidirectional text (RTL/LTR mixing)
  • Zero-width characters (ZWNJ/ZWJ) needed for Indic scripts

Describe the solution you'd like

Full Unicode and Universal Acceptance compliance:

# Should work correctly
guardrail.validate("प्रयोक्ता@भारत.भारत")  # EAI email
guardrail.validate("https://भारत.भारत")    # IDN domain
guardrail.validate("مرحبا שלום")             # Mixed RTL scripts
guardrail.validate("हिन्‌दी")                # ZWNJ in Hindi

Technical Requirements:

  • Unicode 15.0 support
  • Grapheme cluster counting (not byte counting)
  • Confusable character detection (homograph attacks)
  • Zero-width character handling for Indic scripts
  • IDN/EAI validation per RFC 5891/6531
  • Proper normalization across all forms

Additional context

  • Required for global internet standards compliance
  • Critical for preventing Unicode-based security attacks (homographs)
  • Necessary for Indian language support (22 official languages)
  • Enables proper handling of emoji and complex scripts

References:

Implementation Impact

This would make any-guardrail compliant with international standards and usable globally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions