Skip to content

State of Unicode in Rust #7274

@robertbastian

Description

@robertbastian

The point of this issue is to track the implementation of Unicode algorithms in Rust, specifically wrt compatibility with ICU4X.

The Unicode Technical Reports are listed at https://www.unicode.org/reports/. Some identify as Unicode Standard Annex (UAX) or Unicode Technical Standard (UTS).

. Name Status
UAX 9 Unicode Bidirectional Algorithm ✅ implemented in unicode-bidi, can use icu::properties data
UTS 10 Unicode Collation Algorithm ✅ implemented in icu::collator
UAX 11 East Asian Width ⚠️ implemented in unicode-width, cannot use icu::properties data
UAX 14 Unicode Line Breaking Algorithm ⚠️ implemented in icu::segmenter, outdated
UAX 15 Unicode Normalization Forms ✅ implemented in icu::normalizer
UTS 18 Unicode Regular Expressions ⚠️ level 1 implemented in regex, cannot use icu::properties data
UAX 24 Unicode Script Property ✅ implemented in icu::properties.
⚠️ implemented in unicode-script, cannot use icu::properties data, limited interop
UAX 29 Unicode Text Segmentation ⚠️ implemented in icu::segmenter, outdated
UAX 31 Unicode Identifiers and Syntax ❌ partial implementation in unicode-xid and unicode-script, cannot use icu::properties data
UAX 34 Unicode Named Character Sequences ❌ not implemented, data not in icu::properties
UTS 35 Unicode Locale Data Markup Language (LDML) ⚠️ partially implemented in icu::calendar, icu::datetime, icu::decimal, icu::list, icu::locale, icu::pattern, icu::plurals, icu::time
UTR 36 Unicode Security Considerations probably superseded by UAX 31, UAX 39, UAX 55
UTS 37 Unicode Ideographic Variation Database does not specify algorithms
UAX 38 Unicode Han Database (Unihan) does not specify algorithms
UTS 39 Unicode Security Mechanisms ⚠️ partially implemented in unicode-security, cannot use icu::properties data
UAX 41 Common References for Unicode Standard Annexes does not specify algorithms
UAX 42 Unicode Character Database in XML does not specify algorithms
UAX 44 Unicode Character Database does not specify algorithms
UAX 45 U-Source Ideographs does not specify algorithms
UTS 46 Unicode IDNA Compatibility Processing ✅ implemented in idna, uses icu
UAX 50 Unicode Vertical Text Layout ⚠️ implemented in harfbuzz, however the relevant properties are not used through harfbuzz-traits (e.g. icu)
⚠️ implemented in harfrust, does not support external Unicode data sources at all
UTS 51 Unicode Emoji ⚠️ partially implemented in icu::properties
UAX 53 Unicode Arabic Mark Rendering ⚠️ implemented in harfbuzz, however the relevant properties are not used through harfbuzz-traits (e.g. icu)
⚠️ implemented in harfrust, does not support external Unicode data sources at all
UTS 55 Unicode Source Code Handling ❌ not implemented
UAX 57 Unicode Egyptian Hieroglyph Database (Unikemet) does not specify algorithms
UTS 58 Draft Unicode Link Detection and Serialization ❌ not implemented
UTR 59 Proposed Draft East Asian Spacing ❌ not implemented
UAX 60 Draft Data for Non Han Ideographic Scripts does not specify algorithms
UTS 61 Proposed Draft Unicode Set Notation ⚠️ partial experimental support, missing some data such as blocks and character names

Metadata

Metadata

Assignees

No one assigned

    Labels

    trackingThis issue tracks a ticket in another project

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions