Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scan filter for ORC decimal reader #11067

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Sep 23, 2024

'leafCallToSubfieldFilter' converts decimal filter as Subfield filter, but
'SelectiveDecimalColumnReader::read' rejects scan spec filter. This PR supports
scan filter for ORC decimal reader.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 23, 2024
Copy link

netlify bot commented Sep 23, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 2937bac
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66f117f579e91b00088d2345

@Yuhta Yuhta self-requested a review September 23, 2024 14:41
@@ -512,6 +512,13 @@ std::unique_ptr<common::Filter> leafCallToSubfieldFilter(
}
return isNull();
}
} else if (call.name() == "isnotnull") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't IS NOT NULL parsed into not(is_null(...))?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spark there is an expression 'IsNotNull' and it is frequently used in filter pushdown.This issue adds the background for this change: #11093. Thanks.

Copy link
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add some tests to E2EFilterTest for decimal types

// Fill decimals before applying filter.
fillDecimals();

const auto rawNulls = nullsInReadRange_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this logic to a static function or trait class so that it can be later reused in parquet as well

if constexpr (std::is_same_v<DataT, int64_t>) {
processFilter(filter, rows, rawNulls);
} else {
VELOX_UNSUPPORTED("Unsupported filter: {}.", (int)filterKind);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be the case for schema evolution. Just throw VELOX_FAIL here or VELOX_NYI for these and add the requested type and file type information to the error message.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Sep 25, 2024

You also need to add some tests to E2EFilterTest for decimal types

@Yuhta I find the E2EFilterTest relies on DWRF writer, but for now bigint-decimal will be written as int64_t, and hugeint-decimal is not supported. Do we need to support them first?

case TypeKind::BIGINT:
return std::make_unique<IntegerColumnWriter<int64_t>>(
context, type, sequence, onRecordPosition);

DWIO_RAISE("not supported yet ", mapTypeKindToName(type.type()->kind()));

@Yuhta
Copy link
Contributor

Yuhta commented Sep 25, 2024

@rui-mo Yes it would be ideal to support writing decimals first. Otherwise the tests are very limited and we have no confidence the new code would work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants