-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: miss output ordering during projection #15683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I don't understand the failing sqllogictest:
Based on current code and doc, my understanding is that the So why |
One of the users as of now is
In the past we only added an I do think the current behavior is a potential pain point, because sometimes users know their files are ordered and non-overlapping, but can't get DataFusion to avoid the sort. However, this has been the behavior in DataFusion for a long time (at least the 2 years I have been using it), so if we were to change it, I would prefer if we do it in a way that makes this behavior change very obvious. |
How about adding a new config, such as |
Also, I noticed the Maybe we can extract the check logic to a method and add a check for the |
Which issue does this PR close?
Rationale for this change
Currently, the
base_config.output_ordering
is specified by the external mechanism; the users usually decide on theoutput_ordering
. See here: https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/listing/table.rs#L273-L277That is, users should ensure that the output ordering is correct.
Most users may not enable the
collect_statistic
config, so they will miss output ordering here. Even if the config is enabled, the statistic may be inaccurate, or for some file formats, they don't have statistics. If users write the output_ordering to config, I guess they think/believe their data is sorted, but the check inget_projected_output_ordering
will drop the output ordering for them.What changes are included in this PR?
Still do the check, but only output warning.
Are these changes tested?
Are there any user-facing changes?