[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510

yhuang-db · 2025-10-02T20:28:56Z

What changes were proposed in this pull request?

This PR proposes to add a utility class to enable the evaluation of an InternalRow using a DSv2 predicate. In particular, it includes

converting dsv2 predicates to catalyst expressions
converting dsv2 expression to catalyst expressions
- NamedReference -> BoundReference
- LiteralValue -> catalyst Literal
Creating InterpretedPredicate and evaluate internalRow

Why are the changes needed?

This would be helpful for partition pruning, where the runtime filters are DSv2 predicates and the partitionValue are internalRows (for partitionFiles in Spark). In this way, partitionFiles can be pruned directly with DSv2 predicates at the scan level.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

gengliangwang · 2025-10-09T19:57:42Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionEvaluator.java

+   * @return Catalyst Expression representing the converted predicate, or empty if the predicate is
+   * unsupported or references unknown columns
+   */
+  public static Optional<Expression> dsv2PredicateToCatalystExpression(


How about convertV2PredicateToCatalyst

gengliangwang · 2025-10-09T19:58:02Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionEvaluator.java

+   * @return Catalyst Expression representing the resolved expression, or empty if the expression is
+   * unsupported or references unknown columns
+   */
+  public static Optional<Expression> dsv2ExpressionToCatalystExpression(


How about convertV2ExpressionToCatalyst

Also, we can make it private for now

gengliangwang · 2025-10-09T19:59:26Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionEvaluator.java

+   * predicate could not be converted
+   */
+  public static Optional<Boolean> evaluateInternalRowOnDsv2Predicate(
+      org.apache.spark.sql.connector.expressions.filter.Predicate predicate,


why having the full class name here?

gengliangwang · 2025-10-09T20:02:17Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionEvaluator.java

+   * @param predicate   the DSV2 Predicate to evaluate
+   * @param internalRow the InternalRow to evaluate the predicate against
+   * @param schema      the schema used for resolving column references in the predicate
+   * @return Optional containing the result of the evaluation (true or false), or empty if the


Returning an optional boolean is a bit confusing. How about having a input parameter failOnError=false which controls the behavior of predicate conversion failure

By introducing failOnError, do you mean the first behavior or the second?

failOnError = True failOnError = False

converted, satisfied True True

converted, unsatisfied False False

unconverted False True

failOnError = True failOnError = False

converted, satisfied True True

converted, unsatisfied False False

unconverted throw error True?

The second.
This is just a suggestion. IMO it is easier to use.

add v2 expression evaluator and tests

a4c3980

github-actions bot added the SQL label Oct 2, 2025

use InterpretedPredicate to avoid codegen

4f1a092

gengliangwang reviewed Oct 9, 2025

View reviewed changes

yhuang-db added 4 commits October 10, 2025 15:15

rename functions, add exception when alwaysTrueOnUnconverted=false

c32cd36

remove error

008dbd3

simplify assert exception

59f6c0d

add license

d183590

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510

[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510

yhuang-db commented Oct 2, 2025

Uh oh!

gengliangwang Oct 9, 2025

Uh oh!

gengliangwang Oct 9, 2025

Uh oh!

gengliangwang Oct 9, 2025

Uh oh!

gengliangwang Oct 9, 2025

Uh oh!

gengliangwang Oct 9, 2025

Uh oh!

yhuang-db Oct 10, 2025

Uh oh!

gengliangwang Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	failOnError = True	failOnError = False
converted, satisfied	True	True
converted, unsatisfied	False	False
unconverted	False	True

[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510

Are you sure you want to change the base?

[SPARK-53793][SQL] Use DSv2 predicate to evaluate InternalRow #52510

Conversation

yhuang-db commented Oct 2, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

yhuang-db Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants