-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Minor updates for DeleteVector
#27938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor updates for DeleteVector
#27938
Conversation
810eb61 to
bdda010
Compare
dain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not "miss" the changes from the first two commits, I rejected the suggestion. Neither of these matter because the number of roaring bitmaps will be tiny because each holds 4 billion entries, and it is unlikely to have a file with > 4B entries. Instead I think these changes hurt readability. The last commit looks good.
If you guys feel strongly that you want these changes, then, fine, make them, but I think it is the wrong direction.
| private Builder addAll(RoaringBitmap[] deletedRows) | ||
| { | ||
| // reverse order to minimize resizing the array | ||
| for (int key = deletedRows.length - 1; key >= 0; key--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make this change. It is more complicated and isn't going to help any real world usecases. Each roaring bitmap holds 4B entires, we do not need to optimzie for files with > 4b entries, and instead we should be focusing on simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, make sense, I revert this one(reverse visiting logic), but keep the logic that extracting the method and reuse it.
However, adding an explicit position count is more readable to me. I mistakenly removed the comment earlier, also revert that.
I agree that in practice the real use case should not have 4B entries, but we cannot strictly guarantee that. we are still using an array and dynamically extending it based on length checks, rather than fixing the array size to exactly one bitmap, using the count is more obvious to understand the state of the vector, and it reduce the code - we don't need to calculate it when serializing
Introduces a `bitmapCount` field to DeletionVector and its Builder to explicitly track the number of non-empty bitmaps.
Added more test cases to `TestDeletionVector` to exercise the logic in serialize/deserialize on some of the `RoaringBitmap` contains more than one position.
bdda010 to
7ec481e
Compare
Description
some left or reviews from #27788
Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: