-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51552] [SQL] Disallow temporary variables in persisted views when under identifier #50325
base: master
Are you sure you want to change the base?
Conversation
a74dd65
to
e9c9dac
Compare
@srielau, @cloud-fan PTAL at this fix. Thanks |
0d3545b
to
d062ddf
Compare
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved && p.childrenResolved => | ||
|
||
referredTempVars ++= collectTemporaryVariables(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we do the same for ExpressionWithUnresolvedIdentifier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
c56b5bc
to
0582899
Compare
02166a9
to
98bf403
Compare
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
Can we clarify the Spark release that supports IDENTIFIER and session variable, so that we know the scope of this breaking change? |
cdff9fe
to
107819d
Compare
@cloud-fan @srielau PTAL when you have time. Thanks |
What do you exactly mean by this? |
@mihailoale-db if it breaks something that only works in 4.0, then we should backport this commit to 4.0 as 4.0 is not released yet. Otherwise we should add a legacy config to restore the old behavior. |
val analyzedChild = apply0(createView.child) | ||
val analyzedQuery = apply0(createView.query, Some(referredTempVars)) | ||
val tableIdentifier = ResolvedIdentifierToTableIdentifier(analyzedChild) | ||
if (referredTempVars.nonEmpty && tableIdentifier.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation is overly complicated to have view name in the error message. We should make it simpler:
- rename the previous
def apply
todef apply0
and add an extra ArrayBuffer parameter to collect visited session variables by the IDENTIFIER clause. - if the root node is
CreateView
, fail if the collected session variables is not empty. There is no need to mention the view name here as this is a single CREATE VIEW command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we use PERSISTED
as objName
in notAllowedToCreatePermanentViewByReferencingTempVarError
? The message would look like
Cannot create the persistent object PERSISTED of the type VIEW ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we still have to separate logic for collecting variables in view child and view query because we can have them in view child but cant have in view query, right? @cloud-fan
Identifiers are added here SPARK-43205 - 3.5.0 and variables are added here SPARK-42849 - 3.5.0 and they worked together since day 0. I'll add a flag as a tool for prevention. |
338e8c5
to
0882d4f
Compare
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
0882d4f
to
915eecc
Compare
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
915eecc
to
61a4502
Compare
What changes were proposed in this pull request?
We collect temporary variables while resolving identifier clause to throw later if needed.
Why are the changes needed?
This is needed to align the use case with correct semantics (temporary variables are not allowed in persisted views).
Does this PR introduce any user-facing change?
Users won't be able to have temporary variables in persisted views (the feature was broken anyways).
How was this patch tested?
Added test.
Was this patch authored or co-authored using generative AI tooling?
No.