Skip to content

Conversation

@chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Jan 16, 2026

Description

some left or reviews from #27788

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 16, 2026
@chenjian2664 chenjian2664 requested a review from dain January 16, 2026 09:16
@github-actions github-actions bot added the iceberg Iceberg connector label Jan 16, 2026
Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not "miss" the changes from the first two commits, I rejected the suggestion. Neither of these matter because the number of roaring bitmaps will be tiny because each holds 4 billion entries, and it is unlikely to have a file with > 4B entries. Instead I think these changes hurt readability. The last commit looks good.

If you guys feel strongly that you want these changes, then, fine, make them, but I think it is the wrong direction.

private Builder addAll(RoaringBitmap[] deletedRows)
{
// reverse order to minimize resizing the array
for (int key = deletedRows.length - 1; key >= 0; key--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make this change. It is more complicated and isn't going to help any real world usecases. Each roaring bitmap holds 4B entires, we do not need to optimzie for files with > 4b entries, and instead we should be focusing on simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, make sense, I revert this one(reverse visiting logic), but keep the logic that extracting the method and reuse it.

However, adding an explicit position count is more readable to me. I mistakenly removed the comment earlier, also revert that.
I agree that in practice the real use case should not have 4B entries, but we cannot strictly guarantee that. we are still using an array and dynamically extending it based on length checks, rather than fixing the array size to exactly one bitmap, using the count is more obvious to understand the state of the vector, and it reduce the code - we don't need to calculate it when serializing

Introduces a `bitmapCount` field to DeletionVector and its
Builder to explicitly track the number of non-empty bitmaps.
Added more test cases to `TestDeletionVector` to exercise the logic
in serialize/deserialize on some of the `RoaringBitmap` contains
 more than one position.
@chenjian2664 chenjian2664 force-pushed the jack/op-deletion-vector branch from bdda010 to 7ec481e Compare January 17, 2026 15:05
@raunaqmorarka raunaqmorarka merged commit 0d83e10 into trinodb:master Jan 19, 2026
46 checks passed
@github-actions github-actions bot added this to the 480 milestone Jan 19, 2026
@chenjian2664 chenjian2664 deleted the jack/op-deletion-vector branch January 19, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

3 participants