-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[WIP] Support data skipping based on column stats in Hudi connector #25184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
74846a0
to
30b1e6d
Compare
@@ -188,6 +188,16 @@ public static Map<String, List<FileSlice>> lookupCandidateFilesInMetadataTable( | |||
.stream() | |||
.filter(fileSlice -> pruneFiles(fileSlice, statsByFileName, regularColumnPredicates, regularColumns)) | |||
.collect(Collectors.toList()))); | |||
if (log.isInfoEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check debug level instead?
public long getPos() | ||
throws IOException | ||
{ | ||
return super.getPos() - startOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can return negative value? Or is it guaranteed that the underlying stream is positioned at or after startOffset
at initialization?
throws IOException | ||
{ | ||
if (pos > length) { | ||
throw new IOException(String.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we should also add a safety check just in case pos < 0
then throw exception in this case as well?
"Attempting to seek past inline content: position to seek to is %s but the length is %s", | ||
pos, length)); | ||
} | ||
super.seek(startOffset + pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we seek till pos == length. I am assuming that length
corresponds to a single hfile data block. Is there any chance to hit EOF?
throws IOException | ||
{ | ||
return new InlineSeekableDataInputStream( | ||
(TrinoInputStream) storage.open(getFilePathFromInlinePath(path)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we test with colstats having log files in latest file slices? Just want t ensure that we don't hit any ClassCastException while casting here?
} | ||
|
||
@Override | ||
public SeekableDataInputStream openSeekable(StoragePath path, int bufferSize, boolean wrapStream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we ignore bufferSize
and wrapStream
?
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: