-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: union all by name #15603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: union all by name #15603
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @chenkovsky (and all the other PRs recently -- very much appreciated)
@@ -362,6 +362,8 @@ pin_project! { | |||
|
|||
#[pin] | |||
stream: S, | |||
|
|||
transform_schema: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like this is fixing the symptom rather than the root cause
I think it would be better to have the correct schema reflected in the plan in the first place 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, correct nullability in schema is better. I tried to fix logical plan before.
But nullability in logical plan won't affect physical plan. it's ignored.
LogicalPlan::Projection(Projection { input, expr, .. }) => self |
in physical plan, it will recompute nullaibility from bottom to top.
e.nullable(&input_schema)?, |
but in this scenario, it seems that we need to pass nullability from top to bottom.
I need more suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to learn some experience from spark.
for logical plan, I haven't found any logic to handle this problem.
for physical plan, spark is much easier, its InternalRow is schemaless. so it will use the schema of physical plan by default. but recordbatch contains schema.
I'm not 100% sure, I think current logical plan and physical plan schema is correct. the root cause is that recordbatch's schema doesn't match physical plan's. so adding an adapter is a proper way.
let ret = this.stream.poll_next(cx); | ||
if transform_schema { | ||
if let Poll::Ready(Some(Ok(batch))) = ret { | ||
return Poll::Ready(Some(batch.with_schema(schema).map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is one of the notorious problems when logical schema doesn't match the physical one on nullability/metadata. But this change might bring a performance impact, although the schema change is just reassigning the value but it also calls schema_contains
which may be expensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there's performance concern now. if this approach is feasible, I can try to optimize it, maybe use RecordBatch::try_new_with_options
Thanks for looking into the nullable issue, it's been on my plate for a bit to look into some more. It's really the last blocker I know of for union by name to work correctly. |
Which issue does this PR close?
physical-expr
: Nullability ofLiteral
is not determined by surrounding context #15394.Rationale for this change
schema from inner physical plan is returned.
What changes are included in this PR?
update UnionExec and RecordBatchStreamAdapter to transform schema.
I found that logical plan's nullability for Projection is also not correct after optimization,
But this won't make the test fail. So I haven't included this part in this PR. Do we need to correct logical plan?
Are these changes tested?
UT
Are there any user-facing changes?
No