[Data] Introduce RenameExpr to separate rename semantics from AliasExpr#60804
[Data] Introduce RenameExpr to separate rename semantics from AliasExpr#60804slfan1989 wants to merge 3 commits intoray-project:masterfrom
Conversation
This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.
Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior
Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations
This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.
Signed-off-by: slfan1989 <slfan1989@apache.org>
There was a problem hiding this comment.
Code Review
This pull request introduces a dedicated RenameExpr to separate column renaming from aliasing computed columns, which was previously handled by AliasExpr with an internal flag. This is a significant improvement for code clarity and type safety. The changes are well-executed, with updates to all necessary expression visitors and logical optimization rules. The addition of comprehensive tests for RenameExpr ensures the new functionality is robust. I have one suggestion to improve the consistency of expression substitution logic.
| visited = self.visit(expr.expr) | ||
| if isinstance(visited, ColumnExpr): | ||
| return RenameExpr( | ||
| data_type=visited.data_type, expr=visited, _name=expr.name | ||
| ) | ||
| return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name) |
There was a problem hiding this comment.
For consistency with visit_alias, consider calling _unalias() on the visited expression. This would prevent creating nested aliases if the substitution itself results in an AliasExpr and would correctly preserve the RenameExpr type when possible.
For example, if col('a')._rename('b') is visited with a substitution {'a': col('c').alias('d')}:
- The current implementation produces a nested alias:
(col('c').alias('d')).alias('b'). - With
_unalias(), it would becomecol('c')._rename('b'), which seems more correct as the intermediate aliasdis being renamed tobanyway.
| visited = self.visit(expr.expr) | |
| if isinstance(visited, ColumnExpr): | |
| return RenameExpr( | |
| data_type=visited.data_type, expr=visited, _name=expr.name | |
| ) | |
| return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name) | |
| visited = self.visit(expr.expr)._unalias() | |
| if isinstance(visited, ColumnExpr): | |
| return RenameExpr( | |
| data_type=visited.data_type, expr=visited, _name=expr.name | |
| ) | |
| return AliasExpr(data_type=visited.data_type, expr=visited, _name=expr.name) |
python/ray/data/_internal/planner/plan_expression/expression_visitors.py
Show resolved
Hide resolved
This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.
Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior
Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations
This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.
Signed-off-by: slfan1989 <slfan1989@apache.org>
This PR introduces a dedicated RenameExpr type to replace the _is_rename
flag in AliasExpr, providing clearer semantics and better type safety for
column renaming operations.
Changes:
- Add RenameExpr class for explicit column renaming semantics
- Remove _is_rename flag from AliasExpr (now purely for computed columns)
- Update all visitors to handle RenameExpr (9 visitor implementations)
- Update logical optimization rules for clearer rename vs alias handling
- Add comprehensive tests for RenameExpr behavior
Behavioral changes:
- col("a")._rename("b") now creates RenameExpr instead of AliasExpr
- col("a").alias("b") remains AliasExpr (semantically correct for aliasing)
- Projection pushdown now correctly distinguishes rename from alias operations
This resolves the TODO in _ColumnSubstitutionVisitor and enables future
optimizations by making rename semantics explicit in the type system.
Signed-off-by: slfan1989 <slfan1989@apache.org>
Description
This PR introduces a dedicated
RenameExprtype to replace the_is_renameflag inAliasExpr, providing clearer semantics and better type safety for column renaming operations in Ray Data expressions.Key Changes:
RenameExprclass for explicit column renaming semantics_is_renameflag fromAliasExpr(now purely for computed columns/aliasing)RenameExprRenameExprbehavior and structural equalityBehavioral Changes:
col("a")._rename("b")now createsRenameExprinstead ofAliasExprcol("a").alias("b")remainsAliasExpr(semantically correct for aliasing computed expressions)Benefits:
_ColumnSubstitutionVisitorRelated issues
N/A
Additional information
This refactoring maintains backward compatibility while improving the internal representation. All existing tests pass, and new tests validate the
RenameExprbehavior including structural equality, evaluation equivalence, and proper handling during column substitution.