-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support scan filter for ORC decimal reader #11067
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@@ -512,6 +512,13 @@ std::unique_ptr<common::Filter> leafCallToSubfieldFilter( | |||
} | |||
return isNull(); | |||
} | |||
} else if (call.name() == "isnotnull") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't IS NOT NULL
parsed into not(is_null(...))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Spark there is an expression 'IsNotNull' and it is frequently used in filter pushdown.This issue adds the background for this change: #11093. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add some tests to E2EFilterTest
for decimal types
// Fill decimals before applying filter. | ||
fillDecimals(); | ||
|
||
const auto rawNulls = nullsInReadRange_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this logic to a static function or trait class so that it can be later reused in parquet as well
if constexpr (std::is_same_v<DataT, int64_t>) { | ||
processFilter(filter, rows, rawNulls); | ||
} else { | ||
VELOX_UNSUPPORTED("Unsupported filter: {}.", (int)filterKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the case for schema evolution. Just throw VELOX_FAIL
here or VELOX_NYI
for these and add the requested type and file type information to the error message.
@Yuhta I find the E2EFilterTest relies on DWRF writer, but for now bigint-decimal will be written as int64_t, and hugeint-decimal is not supported. Do we need to support them first? velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Lines 2039 to 2041 in 3a1a60a
velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Line 2090 in 3a1a60a
|
@rui-mo Yes it would be ideal to support writing decimals first. Otherwise the tests are very limited and we have no confidence the new code would work correctly. |
'leafCallToSubfieldFilter' converts decimal filter as Subfield filter, but
'SelectiveDecimalColumnReader::read' rejects scan spec filter. This PR supports
scan filter for ORC decimal reader.