Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51552] [SQL] Disallow temporary variables in persisted views when under identifier #50325

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihailoale-db
Copy link
Contributor

What changes were proposed in this pull request?

We collect temporary variables while resolving identifier clause to throw later if needed.

Why are the changes needed?

This is needed to align the use case with correct semantics (temporary variables are not allowed in persisted views).

Does this PR introduce any user-facing change?

Users won't be able to have temporary variables in persisted views (the feature was broken anyways).

How was this patch tested?

Added test.

Was this patch authored or co-authored using generative AI tooling?

No.

@mihailoale-db
Copy link
Contributor Author

@srielau, @cloud-fan PTAL at this fix. Thanks

@mihailoale-db mihailoale-db force-pushed the identifiervarview branch 2 times, most recently from 0d3545b to d062ddf Compare March 20, 2025 10:07
case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved && p.childrenResolved =>

referredTempVars ++= collectTemporaryVariables(p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we do the same for ExpressionWithUnresolvedIdentifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mihailoale-db mihailoale-db force-pushed the identifiervarview branch 2 times, most recently from c56b5bc to 0582899 Compare March 20, 2025 13:57
@mihailoale-db mihailoale-db force-pushed the identifiervarview branch 3 times, most recently from 02166a9 to 98bf403 Compare March 20, 2025 20:21
@cloud-fan
Copy link
Contributor

Can we clarify the Spark release that supports IDENTIFIER and session variable, so that we know the scope of this breaking change?

@mihailoale-db mihailoale-db force-pushed the identifiervarview branch 2 times, most recently from cdff9fe to 107819d Compare March 21, 2025 10:13
@mihailoale-db
Copy link
Contributor Author

@cloud-fan @srielau PTAL when you have time. Thanks

@mihailoale-db
Copy link
Contributor Author

Can we clarify the Spark release that supports IDENTIFIER and session variable, so that we know the scope of this breaking change?

What do you exactly mean by this?

@cloud-fan
Copy link
Contributor

@mihailoale-db if it breaks something that only works in 4.0, then we should backport this commit to 4.0 as 4.0 is not released yet. Otherwise we should add a legacy config to restore the old behavior.

val analyzedChild = apply0(createView.child)
val analyzedQuery = apply0(createView.query, Some(referredTempVars))
val tableIdentifier = ResolvedIdentifierToTableIdentifier(analyzedChild)
if (referredTempVars.nonEmpty && tableIdentifier.isDefined) {
Copy link
Contributor

@cloud-fan cloud-fan Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation is overly complicated to have view name in the error message. We should make it simpler:

  1. rename the previous def apply to def apply0 and add an extra ArrayBuffer parameter to collect visited session variables by the IDENTIFIER clause.
  2. if the root node is CreateView, fail if the collected session variables is not empty. There is no need to mention the view name here as this is a single CREATE VIEW command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we use PERSISTED as objName in notAllowedToCreatePermanentViewByReferencingTempVarError? The message would look like

Cannot create the persistent object PERSISTED of the type VIEW ...

Copy link
Contributor Author

@mihailoale-db mihailoale-db Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we still have to separate logic for collecting variables in view child and view query because we can have them in view child but cant have in view query, right? @cloud-fan

@mihailoale-db
Copy link
Contributor Author

@mihailoale-db if it breaks something that only works in 4.0, then we should backport this commit to 4.0 as 4.0 is not released yet. Otherwise we should add a legacy config to restore the old behavior.

Identifiers are added here SPARK-43205 - 3.5.0 and variables are added here SPARK-42849 - 3.5.0 and they worked together since day 0. I'll add a flag as a tool for prevention.

@mihailoale-db mihailoale-db force-pushed the identifiervarview branch 2 times, most recently from 338e8c5 to 0882d4f Compare March 24, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants