-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51068][SQL] Canonicalized CTEs to avoid cached result not being used and recomputed #50360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@cloud-fan can you check? |
@dongjoon-hyun can you review this ? |
@HyukjinKwon can you take a look ? |
Mind filing a JIRA, and linking it to the PR title? See also https://spark.apache.org/contributing.html |
My bad. Added Jira Link |
@HyukjinKwon can you check now ? |
@HyukjinKwon / @dongjoon-hyun can you check ? |
@HyukjinKwon/ @dongjoon-hyun can you review this? |
i am not super used to this code path. cc @peter-toth |
Thanks @HyukjinKwon !! @peter-toth can you help here? |
I'm travelling this week, but happy to check it Friday. |
I think this canonicalization makes sense. But why don't you just call a simple |
Thanks @peter-toth for your review !! Are you suggesting creating a separate function |
No, I believe it already exists, but I might be wrong. |
@peter-toth Thanks for the clarification. Couldn't find it in branch-3.3 |
Hi, @nimesh1601 and all. Sorry for the interruption, but I need to close this PR to prevent accidental merging because this is opened to the End-Of-Life Please open the PR to |
What changes were proposed in this pull request?
In this PR we plan to
To check whether the plan exists in the cache or not, CacheManager matches the canonicalized version of the plan. Currently, in canonicalized versions, CTEIds are not handled, which results in unnecessary cache misses in cases where queries using CTE are stored. This issue starts after the commit to Avoid inlining non-deterministic With-CTEs in which each CTERelationDef and CTERelationRef were introduced and their canonicalization was not handled. So to fix this we need to add Canonicalize logic for CTEs too.
Why are the changes needed?
To fix the bug mentioned above.
Does this PR introduce any user-facing change?
Yes, this will now let cache cte to be reused
Before plan
After this change
How was this patch tested?
Uts are added
Was this patch authored or co-authored using generative AI tooling?
No