Skip to content

Make the composing normalizer stay on the fast path across virama or nukta between starters #7665

@hsivonen

Description

@hsivonen

Based on the result of #7517:

The normalizer design doesn't properly optimize the case of non-starters that never combine backwards, appear often, and often appear without adjacent non-starters. The composing normalizer slice mode fast path should be able to skip over at least one of these at a time: when we have only one of these between two starters, reordering is not an issue.

Ideally, the normalization data would distinguish these, but doing so now would be a data semver break.

At least when the serde feature isn't active, so that we can assume we don't have data from the future, we should have a hack that optimizes on ccc=Nukta and ccc=Virama (and possibly some values of interest for Arabic) if we aren't ready to change the data at this time to explicitly flag this case.

Conceptually similar to #7555.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-performanceArea: Performance (CPU, Memory)C-collatorComponent: Collation, normalizationdiscuss-priorityDiscuss at the next ICU4X meeting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions