Skip to content

Filter low-frequency suffixes from Dense ZeroTrie #7302

@sffc

Description

@sffc

Suffixes that occur in a low percentage of rows should not be added to the dense matrix.

Docs for background: https://unicode-org.github.io/icu4x/rustdoc/zerotrie/dense/struct.ZeroAsciiDenseSparse2dTrieOwned.html

In theory we could calculate how much each suffix adds, but doing so would require recomputing the whole structure, which might be costly. A good first step would be to add a test case that we want to optimize and pick a heuristic that optimizes the size of that test case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-zerovecComponent: Yoke, ZeroVec, DataBakegood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions