Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse alias if possible #14781

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

blaginin
Copy link
Contributor

@blaginin blaginin commented Feb 19, 2025

Which issue does this PR close?

Rationale for this change

Currently, alias over alias creates an extra expression layer, which gets merged in optimize_projections through an expensive recursive function

What changes are included in this PR?

A small change to reuse an existing alias when possible. This affects two cases:

  • Removes unnecessary info when displaying logical plans
  • Simplifies expressions when optimize_projections isn't called (e.g., when there's only one projection and merge_consecutive_projections isn't run)

Are these changes tested?

Extended doctest

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Feb 19, 2025
@alamb
Copy link
Contributor

alamb commented Mar 5, 2025

Interestingly, I was working on a very similar PR last night:

@blaginin
Copy link
Contributor Author

blaginin commented Mar 5, 2025

yes, @alamb, I think we got on the same issue with unnest 😀 - I'm happy to keep working on mine unless you want to take over?

@alamb
Copy link
Contributor

alamb commented Mar 5, 2025

yes, @alamb, I think we got on the same issue with unnest 😀 - I'm happy to keep working on mine unless you want to take over?

Yes, indeed -- something is going on with unnest.

It would be great if you wanted to take over

Feel free to pull over the test case from https://github.com/apache/datafusion/pull/15008/files as well

@alamb alamb mentioned this pull request Mar 5, 2025
@github-actions github-actions bot added the sql SQL Planner label Mar 11, 2025
@github-actions github-actions bot added the core Core DataFusion crate label Mar 11, 2025
Comment on lines 883 to 885

unnest_relation.alias(Some(self.new_table_alias(alias.to_string(), vec![])));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldmedal FYI, this is another change to unnest - I want to always do alias to make behaviour more consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why this way can make behavior more consistent 🤔. Could you explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm considering how to fix the issue mentioned in (#15090 (comment)). I think add an alias for the unnest may be a good idea.

I think it should like

Suggested change
unnest_relation.alias(Some(self.new_table_alias(alias.to_string(), vec![])));
unnest_relation.alias(Some(self.new_table_alias("unnset_table_1", vec![alias.to_string()])));

Then, we can get the result like

SELECT "UNNEST(make_array(Int64(1),Int64(2),Int64(3)))" FROM UNNEST([1, 2, 3]) AS unnest_table ("UNNEST(make_array(Int64(1),Int64(2),Int64(3)))")

It can not only fit what you want to do but generate a valid SQL. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15233
I created an issue to explain more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an edge case example

SELECT * from UNNEST([1,2,3]), UNNEST([1,2,3,4])

is transformed into

SELECT UNNEST(make_array(Int64(1),Int64(2),Int64(3))), UNNEST(make_array(Int64(1),Int64(2),Int64(3),Int64(4))) FROM UNNEST([1, 2, 3]) AS unnset_table_1 (UNNEST(make_array(Int64(1),Int64(2),Int64(3)))) CROSS JOIN UNNEST([1, 2, 3, 4]) AS unnset_table_1 (UNNEST(make_array(Int64(1),Int64(2),Int64(3),Int64(4))))

@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion physical-expr Changes to the physical-expr crates optimizer Optimizer rules substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation labels Mar 24, 2025
@github-actions github-actions bot added datasource Changes to the datasource crate ffi Changes to the ffi crate labels Mar 24, 2025
# Conflicts:
#	datafusion/expr/src/expr.rs
#	datafusion/sql/src/unparser/plan.rs
#	datafusion/sql/tests/cases/plan_to_sql.rs
@github-actions github-actions bot removed documentation Improvements or additions to documentation development-process Related to development process of DataFusion physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate labels Mar 24, 2025
@github-actions github-actions bot added the optimizer Optimizer rules label Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove duplicated alias in Sort
3 participants