Skip to content

CNDB-17012: Improve TrieMemoryIndex row count estimation#2262

Merged
pkolaczk merged 1 commit intomainfrom
c17012-fix-estimate-num-rows
Apr 24, 2026
Merged

CNDB-17012: Improve TrieMemoryIndex row count estimation#2262
pkolaczk merged 1 commit intomainfrom
c17012-fix-estimate-num-rows

Conversation

@pkolaczk
Copy link
Copy Markdown

@pkolaczk pkolaczk commented Mar 9, 2026

Fixes a bug in TermsDistribution#toBigDecimal which didn't preserve
order for some types and broke the assertion
in TrieMemoryIndex#estimateNumRowsMatchingRange.

Additionally, replaces the row count estimation algorithm with a better
one:

  • simpler
  • more efficient (no need to search and iterate the trie)
  • more accurate (especially for wider ranges)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@pkolaczk pkolaczk requested a review from adelapena March 9, 2026 13:54
@pkolaczk pkolaczk force-pushed the c17012-fix-estimate-num-rows branch from 58dc2a8 to 3ea235f Compare March 9, 2026 16:18
Comment thread src/java/org/apache/cassandra/index/sai/memory/TrieMemoryIndex.java Outdated
@pkolaczk pkolaczk force-pushed the c17012-fix-estimate-num-rows branch 2 times, most recently from a238f83 to d1532b7 Compare March 10, 2026 14:26
@sonarqubecloud
Copy link
Copy Markdown

@pkolaczk pkolaczk requested a review from adelapena April 23, 2026 15:17
@pkolaczk pkolaczk force-pushed the c17012-fix-estimate-num-rows branch from d1532b7 to 5584ab0 Compare April 23, 2026 15:19
Copy link
Copy Markdown

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would only add a comment on the new test about the limitations of the algorithm, basically what is said by this comment: #2262 (comment)

Fixes a bug in TermsDistribution#toBigDecimal which didn't preserve
order for some types and broke the assertion
in TrieMemoryIndex#estimateNumRowsMatchingRange.

Additionally, replaces the row count estimation algorithm with a better
one:
- simpler
- more efficient (no need to search and iterate the trie)
- more accurate (especially for wider ranges)
@pkolaczk pkolaczk force-pushed the c17012-fix-estimate-num-rows branch from 5584ab0 to 802773b Compare April 24, 2026 14:55
@sonarqubecloud
Copy link
Copy Markdown

@cassci-bot
Copy link
Copy Markdown

❌ Build ds-cassandra-pr-gate/PR-2262 rejected by Butler


4 regressions found
See build details here


Found 4 new test failures

Test Explanation Runs Upstream
o.a.c.distributed.test.AbortedQueryLoggerTest.testLogsReadMetrics REGRESSION 🔴🔴 0 / 30
o.a.c.index.sai.cql.VectorCompaction100dTest.testCompactionWithEnoughRowsForPQAndDeleteARow[version=ec enableNVQ=true] REGRESSION 🔴 0 / 30
o.a.c.index.sai.cql.VectorKeyRestrictedOnPartitionTest.partitionRestrictedWidePartitionPqCompressedTest[version=ca enableNVQ=false enableFused=false] REGRESSION 🔴 0 / 30
o.a.c.index.sai.cql.VectorSiftSmallTest.testMultiSegmentBuild[version=ed enableNVQ=true enableFused=false] REGRESSION 🔴🔴 0 / 30

Found 7 known test failures

@pkolaczk pkolaczk merged commit 81c68b4 into main Apr 24, 2026
2 of 4 checks passed
@pkolaczk pkolaczk deleted the c17012-fix-estimate-num-rows branch April 24, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants