-
Couldn't load subscription status.
- Fork 118
refactor: Remove raw pointer indexing and add unit tests for RowIndexBuilder #1334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1334 +/- ##
==========================================
+ Coverage 84.66% 84.73% +0.07%
==========================================
Files 113 113
Lines 28303 28396 +93
Branches 28303 28396 +93
==========================================
+ Hits 23963 24062 +99
+ Misses 3205 3198 -7
- Partials 1135 1136 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d9b401d to
d08e062
Compare
| pub(crate) use prim_array_cmp; | ||
|
|
||
| type FieldIndex = usize; | ||
| type FlattenedRangeIterator<T> = std::iter::Flatten<std::vec::IntoIter<Range<T>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1272 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The various call sites like fixup_parquet_read and reorder_struct_array should start using this instead of <RowIndexBuilder as IntoIterator>::IntoIter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The various call sites like
fixup_parquet_readandreorder_struct_arrayshould start using this instead of<RowIndexBuilder as IntoIterator>::IntoIter
I also did this as part of this PR. I searched for all occurrences of <RowIndexBuilder as IntoIterator> and replaced them.
kernel/src/engine/arrow_utils.rs
Outdated
| // We have to clone here to avoid modifying the original vector in each iteration | ||
| ordinals | ||
| .iter() | ||
| .filter_map(|&i| self.row_group_row_index_ranges.get(i).cloned()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think clone is a reasonable trade-off here, since (1) we are only cloning Ranges (i.e., two i64s) and (2) preventing the clone would require us to modify the underlying vector as we iterate (costly) or to design something like a FilteredRowIndexIterator that does not collect intermediate results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, IMO Range<i64> should be Copy -- it's only 16 bytes, the same size as &[T] which is Copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on this, I learned that Range<T> is not Copy because copyable iterators can be confusing.
Thus, we are only left with cloning as far as I'm aware.
| fn extract_record_batch( | ||
| scan_result: ScanResult, | ||
| ) -> Result<RecordBatch, Box<dyn std::error::Error>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
What changes are proposed in this pull request?
This PR follows up on #1272 and addresses open review comments.
impl IntoIterto a fallibleRowInderBuilder::buildmethod that verifies that row group ordinals are unique and within bounds.RowIndexBuilder.Note: We cannot add integration tests for row index reads with filter predicates before addressing #860 and implementing predicate pushdown.
How was this change tested?
New UT.