Skip to content

Conversation

@IshaanXCoder
Copy link

This PR addresses #7302

Implements suffix-frequency pruning to exclude rare suffixes from the dense matrix representation of ZeroAsciiDenseSparse2dTrie.

@IshaanXCoder IshaanXCoder requested a review from a team as a code owner January 16, 2026 23:07
@CLAassistant
Copy link

CLAassistant commented Jan 16, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in icu4x!

This MIN_DENSE_PERCENT approach seems like a decent heuristic to get us going. As usual, please make sure to add test coverage.

Comment on lines 174 to 176
if cursor.is_empty() {
return row_index;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This is an unnecessary change that I believe is wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I'll remove this

// The row and column indexes should be in-range
debug_assert!(false);
return None;
let index = row_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Using checked_add and checked_mul is beneficial, but not related to the subject of this PR. Please split those changes to a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i've created #7442 , working their on this.

.map(|(&suffix, &count)| (suffix, count))
.collect();

// If none meet the threshold, fallback to picking top-K by frequency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none meet the threshold, then I think we should just leave the dense matrix empty.

@IshaanXCoder
Copy link
Author

Hey @sffc thanks for the review, i've tested the code (didnt push the tests)

@IshaanXCoder
Copy link
Author

Hey @sffc , i've commited the changes, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants