-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* @return Catalyst Expression representing the converted predicate, or empty if the predicate is | ||
* unsupported or references unknown columns | ||
*/ | ||
public static Optional<Expression> dsv2PredicateToCatalystExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about convertV2PredicateToCatalyst
* @return Catalyst Expression representing the resolved expression, or empty if the expression is | ||
* unsupported or references unknown columns | ||
*/ | ||
public static Optional<Expression> dsv2ExpressionToCatalystExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about convertV2ExpressionToCatalyst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we can make it private for now
* predicate could not be converted | ||
*/ | ||
public static Optional<Boolean> evaluateInternalRowOnDsv2Predicate( | ||
org.apache.spark.sql.connector.expressions.filter.Predicate predicate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why having the full class name here?
* @param predicate the DSV2 Predicate to evaluate | ||
* @param internalRow the InternalRow to evaluate the predicate against | ||
* @param schema the schema used for resolving column references in the predicate | ||
* @return Optional containing the result of the evaluation (true or false), or empty if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an optional boolean is a bit confusing. How about having a input parameter failOnError=false
which controls the behavior of predicate conversion failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By introducing failOnError
, do you mean the first behavior or the second?
failOnError = True | failOnError = False | |
---|---|---|
converted, satisfied | True | True |
converted, unsatisfied | False | False |
unconverted | False | True |
failOnError = True | failOnError = False | |
---|---|---|
converted, satisfied | True | True |
converted, unsatisfied | False | False |
unconverted | throw error | True? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second.
This is just a suggestion. IMO it is easier to use.
What changes were proposed in this pull request?
This PR proposes to add a utility class to enable the evaluation of an InternalRow using a DSv2 predicate. In particular, it includes
Why are the changes needed?
This would be helpful for partition pruning, where the runtime filters are DSv2 predicates and the partitionValue are internalRows (for partitionFiles in Spark). In this way, partitionFiles can be pruned directly with DSv2 predicates at the scan level.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit tests.
Was this patch authored or co-authored using generative AI tooling?
No