Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648

kosiew · 2025-04-09T07:43:08Z

Which issue does this PR close?

Closes Add more short-circuit optimization scenarios for OR and AND #15636.

Rationale for this change

This PR improves the short-circuit evaluation logic for logical binary expressions (AND, OR) in DataFusion's physical expression layer mentioned in #15636.

What changes are included in this PR?

Introduced an explicit ShortCircuitStrategy enum to differentiate between short-circuit return cases (ReturnLeft, ReturnRight, None)
Rewrote the check_short_circuit function to:
- Consider scalar vs. array inputs
- Handle both AND and OR semantics correctly
- Account for nulls and return None strategy when data is ambiguous
Updated BinaryExpr::evaluate to leverage the new short-circuit strategy cleanly
Added new benchmark case (all_false OR scenario) to test and observe non-short-circuit behavior
Expanded and refactored test coverage for:
- Scalar true/false/null cases
- All true/false array scenarios
- Nullable arrays and null-only arrays

Are these changes tested?

✅ Yes, comprehensive unit tests were added for the new short-circuit logic, including scalar and array edge cases, null handling, and all valid operator configurations.

Are there any user-facing changes?

No direct user-facing API changes, but performance and correctness will improve for queries involving logical expressions. This is an internal enhancement to the expression evaluation engine.

- Delay evaluation of the right-hand side (RHS) unless necessary. - Optimize short-circuiting for `Operator::And` and `Operator::Or` by checking LHS alone first. - Introduce `get_short_circuit_result` function to determine short-circuit conditions based on LHS and RHS. - Update tests to cover various short-circuit scenarios for both `AND` and `OR` operations.

…esult and update assertions - Renamed the test function for clarity. - Updated assertions to use get_short_circuit_result instead of check_short_circuit. - Added additional test cases for AND and OR operations with expected results.

…lt function for null - Updated AND and OR short-circuit conditions to only trigger when all values are either false or true, respectively, and there are no nulls in the array. - Adjusted test case to reflect the change in expected output.

…rcuit checks

…HS is only evaluated when necessary

…ort_circuit_result function

… BinaryExpr evaluation

…ssion tests

…d enhance documentation for get_short_circuit_result

…t evaluation in BinaryExpr

…values in BinaryExpr

alamb · 2025-04-09T10:34:54Z

Thanks @kosiew -- have you run the benchmarks and do you have any results to share?

…optimize logical operations

…Expr to optimize logical operations" This reverts commit a62df47.

…circuit evaluation

…luation in BinaryExpr - Replaced the lazy evaluation of the right-hand side (RHS) with immediate evaluation based on short-circuiting logic. - Introduced a new function `check_short_circuit` to determine if short-circuiting can be applied for logical operators. - Updated the logic to return early for `Operator::And` and `Operator::Or` based on the evaluation of the left-hand side (LHS) and the conditions of the RHS. - Improved clarity and efficiency of the short-circuit evaluation process by eliminating unnecessary evaluations.

…uit function

…nction - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques.

…_short_circuit logic - Introduced a new helper function `count_boolean_values` to count true and false values in a BooleanArray, improving readability and performance. - Updated `check_short_circuit` to utilize the new helper function for counting, reducing redundant operations and enhancing clarity in the evaluation logic for AND/OR operations. - Adjusted comments for better understanding of the short-circuiting conditions based on the new counting mechanism.

…ze check_short_circuit logic" This reverts commit e2b9f77.

…tors - Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result` function. - Updated handling of Boolean data types to return `None` for null values. - Simplified short-circuit checks for AND/OR operations by consolidating logic. - Enhanced readability and maintainability of the code by restructuring match statements.

kosiew · 2025-04-10T11:07:12Z

hi @alamb

Here are the benchmark results after incorporating @acking-you 's enum suggestion

alamb · 2025-04-10T18:22:54Z

Is there anyway to get performance back for the bencmarks where it slowed down?

kosiew · 2025-04-11T09:41:29Z

Findings after trying many optimisation attempts:

benchmark results are not stable. At the µs scale, many factors(besides the code itself) can swing the results.
only some changes are statistically significant (p < 0.05) and I modified my script to retrieve only significant results

Below are some of the significant (unstable) changes, comparing this branch against main:

As an illustration of the unstable benchmark results, I ran cargo bench on main to compare against a saved main baseline and obtained these significant changes:

The above seem to indicate that main branch without any changes has also regressed.

kosiew · 2025-04-11T10:15:31Z

The mean regressions affect idealized (all true, all false) benchmarks, not the complex mixed-null or partial-match cases real queries often encounter.

The gains happen in more practical scenarios, less extreme cases.

The code is now semantically correct with the ShortCircuitStrategy enum and more maintainable.

The unit tests cover edge cases better than before, including nulls and scalars.

Should we merge this?

kosiew · 2025-04-11T10:23:31Z

datafusion/physical-expr/src/expressions/binary.rs

+        // If the left-hand side is an array and the right-hand side is a non-null scalar, try the optimized kernel.
+        if let (ColumnarValue::Array(array), ColumnarValue::Scalar(ref scalar)) =
+            (&lhs, &rhs)
+        {
+            if !scalar.is_null() {
+                if let Some(result_array) =
+                    self.evaluate_array_scalar(array, scalar.clone())?
+                {
+                    let final_array = result_array
+                        .and_then(|a| to_result_type_array(&self.op, a, &result_type));
+                    return final_array.map(ColumnarValue::Array);


a rewrite while trying to optimize this function.

berkaysynnada · 2025-04-11T11:37:45Z

Thank you @kosiew. I'm a bit busy but I'll review this as soon as I find some time

alamb · 2025-04-11T20:01:21Z

@kosiew -- I wonder if you saw this post from @Dandandan : #15631 (comment)

It seems a simpler way to improve performance 🤔 (though I think this PR woudl still apply)

kosiew · 2025-04-14T04:28:50Z

Closing this.
@acking-you improves this significantly in #15694

alamb · 2025-04-16T18:29:03Z

Closing this. @acking-you improves this significantly in #15694

I think this sequence of PRs is quite exciting -- thank you very much for the collaboration leading to #15694

kosiew added 13 commits April 9, 2025 10:06

feat: add debug logging for binary expression evaluation and short-ci…

f5c83e2

…rcuit checks

fix: improve short-circuit evaluation logic in BinaryExpr to ensure R…

7971035

…HS is only evaluated when necessary

fix: restrict short-circuit evaluation to logical operators in get_sh…

4fa7711

…ort_circuit_result function

add more println!("==> ");

25394d8

fix: remove duplicate data type checks for left and right operands in…

f533532

… BinaryExpr evaluation

feat: add debug prints for dictionary values and keys in binary expre…

5b1641c

…ssion tests

Tests pass

d416c72

fix: remove redundant short-circuit evaluation check in BinaryExpr an…

ed6451f

…d enhance documentation for get_short_circuit_result

refactor: remove unnecessary debug prints and streamline short-circui…

ae25207

…t evaluation in BinaryExpr

test: enhance short-circuit evaluation tests for nullable and scalar …

308d7b4

…values in BinaryExpr

github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 9, 2025

kosiew added 15 commits April 9, 2025 18:44

refactor: enhance short-circuit evaluation strategy in BinaryExpr to …

a62df47

…optimize logical operations

Revert "refactor: enhance short-circuit evaluation strategy in Binary…

17ca22b

…Expr to optimize logical operations" This reverts commit a62df47.

bench: add benchmark for OR operation with all false values in short-…

3fe3742

…circuit evaluation

Merge branch 'main' into short-and-enum

89fb9e3

refactor: simplify short-circuit evaluation logic in check_short_circ…

31fd5d8

…uit function

datafusion_expr::lit as expr_lit

aef4153

fix short_circuit/or/all_false benchmark

85c6b67

refactor: optimize short-circuit evaluation in check_short_circuit fu…

8b910e1

…nction - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques.

Revert "refactor: add count_boolean_values helper function and optimi…

b130b24

…ze check_short_circuit logic" This reverts commit e2b9f77.

add benchmark

9faaee3

Merge branch 'main' into short-and

3c4a5fd

Merge branch 'short-and-enum' into short-and

982e847

kosiew mentioned this pull request Apr 11, 2025

Add more short-circuit optimization scenarios for OR and AND #15636

Closed

kosiew added 4 commits April 11, 2025 17:54

check main binary_op.rs

534c5a3

optimise evaluate

d96b50d

optimise evaluate 2

583095e

refactor op:AND, lhs all false op:OR, lhs all true to be faster

19de99c

fix clippy warning

8f3ef97

kosiew commented Apr 11, 2025

View reviewed changes

acking-you mentioned this pull request Apr 12, 2025

Apply pre-selection and computation skipping to short-circuit optimization #15694

Merged

kosiew closed this Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648

Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648

Uh oh!

kosiew commented Apr 9, 2025 •

edited

Loading

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

kosiew commented Apr 10, 2025

Uh oh!

alamb commented Apr 10, 2025

Uh oh!

kosiew commented Apr 11, 2025 •

edited

Loading

Uh oh!

kosiew commented Apr 11, 2025 •

edited

Loading

Uh oh!

kosiew Apr 11, 2025

Uh oh!

berkaysynnada commented Apr 11, 2025

Uh oh!

alamb commented Apr 11, 2025

Uh oh!

kosiew commented Apr 14, 2025

Uh oh!

alamb commented Apr 16, 2025

Uh oh!

Uh oh!

Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648

Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648

Uh oh!

Conversation

kosiew commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Apr 9, 2025

Uh oh!

kosiew commented Apr 10, 2025

Uh oh!

alamb commented Apr 10, 2025

Uh oh!

kosiew commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

berkaysynnada commented Apr 11, 2025

Uh oh!

alamb commented Apr 11, 2025

Uh oh!

kosiew commented Apr 14, 2025

Uh oh!

alamb commented Apr 16, 2025

Uh oh!

Uh oh!

kosiew commented Apr 9, 2025 •

edited

Loading

kosiew commented Apr 11, 2025 •

edited

Loading

kosiew commented Apr 11, 2025 •

edited

Loading