Skip to content

Move normalization data scalar value validation from GIGO to deserialization #2458

Open
@hsivonen

Description

@hsivonen

(It was previously concluded in a meeting that it's not a post-1.0 breaking change to switch from bogus data causing GIGO to bogus data causing a constructor to err out.)

In the normalization data:

ZeroVec<'data, u16> should use a BmpChar type (that has the same bit representation as u16) instead of using u16. The BmpChar type should be validated at deserialization time not to be in the surrogate range. There should be a way to get a char out of BmpChar without the caller having to use an unsafe block.

ZeroVec<'data, U24> should use char instead of U24.

The u32 trie value should be replaced with some kind of special type that has the same bit representation as u32 and that has these constraints:

If the low half is 1 or 0: No further constraints.
If the high half is 0: The low half is either a BMP non-surrogate value or a value between 0xD801 and 0xD8FE, inclusive.
Otherwise: Both halves are BMP non-surrogate values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-collatorComponent: Collation, normalizationS-mediumSize: Less than a week (larger bug fix or enhancement)T-enhancementType: Nice-to-have but not requiredhelp wantedIssue needs an assignee

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions