-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators #15648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Delay evaluation of the right-hand side (RHS) unless necessary. - Optimize short-circuiting for `Operator::And` and `Operator::Or` by checking LHS alone first. - Introduce `get_short_circuit_result` function to determine short-circuit conditions based on LHS and RHS. - Update tests to cover various short-circuit scenarios for both `AND` and `OR` operations.
…esult and update assertions - Renamed the test function for clarity. - Updated assertions to use get_short_circuit_result instead of check_short_circuit. - Added additional test cases for AND and OR operations with expected results.
…lt function for null - Updated AND and OR short-circuit conditions to only trigger when all values are either false or true, respectively, and there are no nulls in the array. - Adjusted test case to reflect the change in expected output.
…HS is only evaluated when necessary
…ort_circuit_result function
… BinaryExpr evaluation
…d enhance documentation for get_short_circuit_result
…t evaluation in BinaryExpr
…values in BinaryExpr
Thanks @kosiew -- have you run the benchmarks and do you have any results to share? |
…optimize logical operations
…Expr to optimize logical operations" This reverts commit a62df47.
…circuit evaluation
…luation in BinaryExpr - Replaced the lazy evaluation of the right-hand side (RHS) with immediate evaluation based on short-circuiting logic. - Introduced a new function `check_short_circuit` to determine if short-circuiting can be applied for logical operators. - Updated the logic to return early for `Operator::And` and `Operator::Or` based on the evaluation of the left-hand side (LHS) and the conditions of the RHS. - Improved clarity and efficiency of the short-circuit evaluation process by eliminating unnecessary evaluations.
…nction - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques.
…_short_circuit logic - Introduced a new helper function `count_boolean_values` to count true and false values in a BooleanArray, improving readability and performance. - Updated `check_short_circuit` to utilize the new helper function for counting, reducing redundant operations and enhancing clarity in the evaluation logic for AND/OR operations. - Adjusted comments for better understanding of the short-circuiting conditions based on the new counting mechanism.
…ze check_short_circuit logic" This reverts commit e2b9f77.
…tors - Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result` function. - Updated handling of Boolean data types to return `None` for null values. - Simplified short-circuit checks for AND/OR operations by consolidating logic. - Enhanced readability and maintainability of the code by restructuring match statements.
hi @alamb Here are the benchmark results after incorporating @acking-you 's enum suggestion ![]() |
Is there anyway to get performance back for the bencmarks where it slowed down? |
Findings after trying many optimisation attempts:
Below are some of the significant (unstable) changes, comparing this branch against ![]() ![]() ![]() As an illustration of the unstable benchmark results, I ran cargo bench on ![]() ![]() The above seem to indicate that |
The mean regressions affect idealized (all true, all false) benchmarks, not the complex mixed-null or partial-match cases real queries often encounter. The gains happen in more practical scenarios, less extreme cases. The code is now semantically correct with the ShortCircuitStrategy enum and more maintainable. The unit tests cover edge cases better than before, including nulls and scalars. Should we merge this? |
// If the left-hand side is an array and the right-hand side is a non-null scalar, try the optimized kernel. | ||
if let (ColumnarValue::Array(array), ColumnarValue::Scalar(ref scalar)) = | ||
(&lhs, &rhs) | ||
{ | ||
if !scalar.is_null() { | ||
if let Some(result_array) = | ||
self.evaluate_array_scalar(array, scalar.clone())? | ||
{ | ||
let final_array = result_array | ||
.and_then(|a| to_result_type_array(&self.op, a, &result_type)); | ||
return final_array.map(ColumnarValue::Array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a rewrite while trying to optimize this function.
Thank you @kosiew. I'm a bit busy but I'll review this as soon as I find some time |
@kosiew -- I wonder if you saw this post from @Dandandan : #15631 (comment) It seems a simpler way to improve performance 🤔 (though I think this PR woudl still apply) |
Closing this. |
Which issue does this PR close?
OR
andAND
#15636.Rationale for this change
This PR improves the short-circuit evaluation logic for logical binary expressions (
AND
,OR
) in DataFusion's physical expression layer mentioned in #15636.What changes are included in this PR?
ShortCircuitStrategy
enum to differentiate between short-circuit return cases (ReturnLeft
,ReturnRight
,None
)check_short_circuit
function to:AND
andOR
semantics correctlyNone
strategy when data is ambiguousBinaryExpr::evaluate
to leverage the new short-circuit strategy cleanlyall_false
OR scenario) to test and observe non-short-circuit behaviorAre these changes tested?
✅ Yes, comprehensive unit tests were added for the new short-circuit logic, including scalar and array edge cases, null handling, and all valid operator configurations.
Are there any user-facing changes?
No direct user-facing API changes, but performance and correctness will improve for queries involving logical expressions. This is an internal enhancement to the expression evaluation engine.