Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51580][SQL] Throw proper user facing error message when lambda function is out of place in HigherOrderFunction #50345

Conversation

mihailotim-db
Copy link
Contributor

What changes were proposed in this pull request?

Throw proper user facing error message when lambda function is out of place in HigherOrderFunction.

Why are the changes needed?

Currently, the analysis fails with invalid call to dataType.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new test cases with expected error message.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 21, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/lambda_error_message branch 2 times, most recently from dff68ee to 1da8eac Compare March 21, 2025 09:33
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/lambda_error_message branch 2 times, most recently from fe9a97b to 355566d Compare March 21, 2025 11:38
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/lambda_error_message branch from 355566d to ea4ed3d Compare March 24, 2025 07:10
@@ -345,6 +345,16 @@ trait CheckAnalysis extends LookupCatalog with QueryErrorsBase with PlanToString

case operator: LogicalPlan =>
operator transformExpressionsDown {
case hof: HigherOrderFunction if hof.arguments.exists {
case LambdaFunction(_, _, _) => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bit more context. So ideally the analyzer should rewrite LambdaFunction if it's in the right place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is in the following query pattern: select transform(x -> x + 1, array(1,2,3)). By reversing the parameter order the user sees the error: call to dataType on unresolved object, because the lambda function will not be bound properly. This is because HigherOrderFunction expects a lambda in a certain place but we never enforce this check anywhere. We can't really place it in checkInputDataTypes because we are not going to hit that codepath before erroring out, so we do it in CheckAnalysis instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after binding, LambdaFunction becomes something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are never going to bind LambdaFunction because in order to bind it, we need to have arguments resolved, but because lambda is an argument we get into a recursive cycle of it never being resolved: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala#L116

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/lambda_error_message branch 2 times, most recently from c96fa20 to f6cdd9a Compare March 24, 2025 14:15
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/lambda_error_message branch from f6cdd9a to b82d5fa Compare March 24, 2025 16:28
@cloud-fan
Copy link
Contributor

The spark connect test failure is unrelated, thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 90050f5 Mar 25, 2025
cloud-fan pushed a commit that referenced this pull request Mar 25, 2025
… function is out of place in `HigherOrderFunction`

### What changes were proposed in this pull request?
Throw proper user facing error message when lambda function is out of place in `HigherOrderFunction`.

### Why are the changes needed?
Currently, the analysis fails with `invalid call to dataType`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new test cases with expected error message.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50345 from mihailotim-db/mihailotim-db/lambda_error_message.

Authored-by: Mihailo Timotic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 90050f5)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants