Skip to content

Provide a trie-based alternative to UnicodeSet #2220

Open
@hsivonen

Description

@hsivonen

The ICU4X composing normalizer uses a UnicodeSet for a fast-path pass-through check while the ICU4C composing normalizer uses a code point trie lookup. ICU4C ends up being faster ever after optimizing other aspects on the ICU4X side, including special-casing the lowest range of the set (the Latin range below the combining diacritics block).

For a known-fragmented compile-time-known set, we should provide an alternative to UnicodeSet that uses the structure of CodePointTrie, but instead of wasting 7 bits of each value byte, divides the length of the value array by 8 and stores 8 logical bits in each byte.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-performanceArea: Performance (CPU, Memory)C-unicodeComponent: Props, sets, triesT-enhancementType: Nice-to-have but not requiredhelp wantedIssue needs an assignee

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions