-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[ES|QL] Date nanos implicit casting in union types #123678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Date nanos implicit casting in union types #123678
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
@elasticmachine update branch |
Fang, can you please double check the status of this PR? Is it ready for review or still working on it? |
I'm trying to make enrich work with union typed fields, and this is the last piece that I want to figure out before making this ready for review. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done with the review yet, but left a first round of remarks - mostly questions.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we emit a warning, given that the user may be unaware of the auto-cast, AND the auto-cast can lead to some dates being out of range of the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we consider adding an optional query parameter that would enable this lenient behavior, rather than making this the default?
Changing the output for the hire_date
field from all null
s to the auto-casted data is technically breaking, which would be mitigated by this approach; although it'd be admittedly very weird if a query relied on the null
s being there.
Update: Or a setting. That's how the changing of the default behavior on partial failures is being rolled out: there is a setting that controls the behavior, and the actual change of default behavior is then just a switch of the default value of the setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder:
For wide index pattern queries like the simple FROM logs-*
, currently the result on large clusters with many indices consists of maaaany null columns, because of mapping conflicts.
Loading/sending constant null blocks is comparably cheap.
If we start enabling auto-casting whenever we can, these queries will now read and return actual data; this might cause performance issues for this kind of query, no?
@fang-xing-esql , maybe we should check in with the others, ensure that this case is covered by our nightlies, and observe if this noticeably changes the performance in some cases, once this is merged?
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Outdated
Show resolved
Hide resolved
|
||
FROM employees, employees_incompatible | ||
| DROP birth_date | ||
| EVAL x = emp_no + 1, y = hire_date + 1 year, z = hire_date::datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to compute hire_date::datetime
? Is all converted to nanos, first, when extracting the field initially - and then cast to datetimes?
If so, this would be losing datetimes that are far in the future/past. We should probably ensure that we skip the first cast to date_nanos if the first thing we do with the field attribute is casting it back to datetime. If we don't do that already. (Not deep enough into the review, yet, so apologies if that's already addressed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a test case for this. We should probably stick to our current union type logic when encountering an explicit cast of a union type field, e.g. hire_date::datetime
- and therefore, dates outside of the nanos range should pose no problem, while nanos will have to undergo rounding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't double cast, if a (union typed)field has been cast to a type explicitly, it won't be casted to date_nanos implicitly again. ResolveUnionTypes
transforms the InvalidMappedField
s(resolved) with explicit casting into MultiTypeEsField
s, and ImplicitCastingForUnionTypedFields
happens after it. ImplicitCastingForUnionTypedFields
finds the InvalidMappedField
s(resolved, mixed with date
and date_nanos
) without explicit casting, and adds implicit casting for them, the AnalyzerTests.testImplicitCastingForDateAndDateNanosFields
is to check there is no double casting.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done now. Thanks @fang-xing-esql , this is a good start and I like the strong test coverage.
However, I see the following problems:
- This seems to go straight to production. Please, let's guard this behind SNAPSHOT for now. We could add a setting or pragma to control this, and flip the setting's default value later when we're confident that all is well.
- The scope is expanded to all numerical types. I think we should check back that this really doesn't have unintended side effects before moving forward.
- This could accidentally worsen the performance of wide index pattern queries, like
FROM log-s* | limit 10
. We'd have to confirm via benchmarks. - I think the scope is also accidentally expanded to allow arbitrary comparisons between date time and nanos in any expression, due to the change to the EsqlDataTypeConverter, but we don't have corresponding tests.
- The code needs to special case, in multiple places, for a white list of commands that are currently working and tested with this. This is brittle and will break when we add new commands. This also adds a lot of code to facilitate this separately from the ResolveUnionTypes analyzer rule by injecting Evals, which I think could be simplified. We also require changes to the type resolution of several commands to account for UNSUPPORTED, which we didn't need for union types, previously.
I think the injecting of EVALs also implies that in a query like
FROM idx* | eval x = multi_typed_date_and_nanos | eval y = multi_typed_date_and_nanos::datetime
the column y
will be computed by first transforming to nanos on block loading, and then to datetimes in the compute engine - rather than loading the blocks as datetimes directly, leading to a lossy conversion.
I'm not sure, but the same could happen between long and float/double, potentially leading to unintended lossy conversions if the long has more digits than float/double can hold?
I think there may be a simpler way to implement this, which would lean on the existing union type logic more and wouldn't have to special case so much, or require injecting Evals, or run into the problem of accidental lossy conversions.
Currently, in ResolveUnionTypes, we scan the query for expressions of the kind some_multi_typed_field::target_type
- and if we find them, we create a new field attribute in the EsRelation for the field loaded as the target type, and replace the expression by this field attribute. (We also deduplicate based on the target type if we've already come across this conversion).
I expect that we could get a simpler implementation if we expanded the scope of the rule to also look for multi-typed field attributes (except those that were already replaced because they were inside a convert function) and treat those field attributes as if they were of the form some_multi_typed_field::target_type
, and just apply the same substitution by the field attribute that's been cast to the target type.
If I'm correct, the changes may be reduced to just determining which target type to apply there - everything else should work as it always did with union types.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/employees_incompatible.csv
Show resolved
Hide resolved
required_capability: implicit_casting_union_typed_numeric_and_date | ||
|
||
FROM employees, employees_incompatible | ||
| STATS c = count(*), b = BUCKET(hire_date, 1 year) + 1 year BY yr = BUCKET(hire_date, 1 year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a test where we use both hire_date::datetime
and hire_date
(auto-cast to nanos) in the same STATS
?
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
if (p instanceof OrderBy | ||
|| p instanceof EsqlProject | ||
|| p instanceof Aggregate | ||
|| p instanceof MvExpand | ||
|| p instanceof Eval | ||
|| p instanceof LookupJoin | ||
|| p instanceof Enrich) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A static white list of commands will be brittle. Newly added commands will not work with this approach and will require that implementers go back and understand the implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep a note here before I forget why I decided to do this outside of ResolveUnionTypes
, instead of adding more complexity into ResolveUnionTypes
. There are a couple of reasons that I can think of now:
No.1 is related to resolving LogicalPlan
, especially when a union typed field is referenced multiple times in the query.
For example there is a query like below:
from log | where union_typed_field < "2025-04-11" | sort union_typed_field;
If we do the implicit casting in ResolveUnionTypes
, without checking the parent/child plan, we may end up cast the union_typed_field
twice, which is unnecessary. We may need to add some complexity in this rule to check parent/child relationship, and reduce the amount the casting.
No.2 is that not all of the places where take an InvalidMappedField
, take an expression like a ConversionFunction
, an Eval
plan needs to be added before the plan that references an InvalidMappedField
, for example keep
/drop
/mv_expand
/enrich
they take identifier or patterns, but they don't take a functionExpression
.
To be able to tangle these situation, we need to know the LogicalPlan
, knowing it is an InvalidMappedField
is not enough. I think that's the main reason why that I decided not to add this complexity into ResolveUnionTypes
, and leave ResolveUnionTypes
do its own job to deal with conversion functions.
When adding a new LogicalPlan
, we will need to keep in mind that what kind of expressions are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do the implicit casting in ResolveUnionTypes, without checking the parent/child plan, we may end up cast the union_typed_field twice, which is unnecessary. We may need to add some complexity in this rule to check parent/child relationship, and reduce the amount the casting.
This is a good thing to be paranoid about, but avoiding multiple casts of the same field attributes is a problem that ResolveUnionTypes
already solved.
For each target type that union_typed_field
is cast to in the query, ResolveUnionTypes
adds 1 field attribute $$union_typed_field$converted_to$target_type
to the EsRelation
, and all casts of this field to target_type
are simply replaced by this new field attribute in all expressions. Then later, once field extractions are getting inserted, only one extraction is inserted at the first point where ``$$union_typed_field$converted_to$target_type` is first referenced in the query plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time trying to solve this problem with a smaller change to ResolveUnionTypes
, only, and while multiple casts would be easily solved this way, I didn't manage to find a solution for the problem that you stated:
No.2 is that not all of the places where take an InvalidMappedField, take an expression like a ConversionFunction, an Eval plan needs to be added before the plan that references an InvalidMappedField, for example keep/drop/mv_expand/enrich they take identifier or patterns, but they don't take a functionExpression.
Replacing datetime_nanos_field
by $$datetime_nanos_field$converted_to$nanos
everywhere in the plan as part of ResolveUnionTypes
can't work because changing the name messes up the resolution, e.g. when there's a downstream EVAL
that shadows datetime_nanos
field.
// Add an implicit EsqlProject on top of the whole query if there isn't one yet, | ||
// without explicit or implicit casting, a union typed field is returned as a null. | ||
newPlan = addImplicitProjectForInvalidMappedFields(newPlan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already logic that does this in UnionTypesCleanup, which seems to be duplicated here at least to some extent.
case EsqlProject p -> { | ||
return esqlProjectForInvalidMappedField(p.source(), p.child(), aliases, newProjections); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit special casing for all kinds of commands will be brittle. Also, it looks like we're not leveraging the existing code and logic in ResolveUnionTypes.
for (Expression e : newProjections) { // Add groupings | ||
newAggs.add(Expressions.attribute(e)); | ||
} | ||
return agg.with(evalForInvalidMappedField(agg.source(), agg.child(), aliases), newProjections, newAggs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we inject an EVAL
here, there's a chance that we'll accidentally inject EVAL
s multiple times. Which could lead to broken plans, when an upstream EVAL
already replaced the multi-typed field attribute by a reference attribute of the same name. Or, put the other way around, since we're replacing a field by another one with the same name, downstream plans may have invalid references now.
This could happen in case that the reference resolution manages to resolve the whole query plan, with multiple references to multi-typed fields, before we start adding Evals.
List<? extends NamedExpression> origAggs = agg.aggregates(); | ||
List<NamedExpression> newAggs = new ArrayList<>(origAggs.size()); | ||
for (int i = 0; i < origAggs.size() - newProjections.size(); i++) { // Add aggregate functions | ||
newAggs.add(origAggs.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the way to update the expressions of a plan is to transform the expressions and use AttributeMap.resolve
with an AttributeMap<Attribute>
that contains which attribute should be replaced by which other attribute.
A date in the valid date-nanos range should be acceptable to parse. This issue was showing up in a difference in behaviour between Elasticsearch ingest and the CsvTests. This fix makes CsvTests behave the same as Elasticearch itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran out of time, so I'll post the comments so far.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/employees_incompatible.csv
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
Thanks for reviewing and brainstorming @craigtaverner @alex-spies ! Most of the comments have been addressed, here is a summary of the list of things that have been discussed:
There are some comments that haven't been addressed yet:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much nicer to me! I have a few comments and questions, but otherwise I think it is good.
@@ -180,7 +187,9 @@ public class Analyzer extends ParameterizedRuleExecutor<LogicalPlan, AnalyzerCon | |||
*/ | |||
new ImplicitCasting(), | |||
new ResolveRefs(), | |||
new ResolveUnionTypes() // Must be after ResolveRefs, so union types can be found | |||
new ResolveUnionTypes(), // Must be after ResolveRefs, so union types can be found | |||
// Must be after ResolveUnionTypes, if there is explicit casting on the union typed fields, implicit casting won't be added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution!
// MultiTypeEsField, and don't be double cast here. | ||
Map<FieldAttribute, Alias> invalidMappedFieldCasted = new HashMap<>(); | ||
LogicalPlan transformedPlan = plan.transformUp(LogicalPlan.class, p -> { | ||
if (p instanceof UnaryPlan == false && p instanceof LookupJoin == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we allow through UnaryPlan and LookupJoin, but later we do nothing for LookupJoin. Is this a placeholder for a further update supporting LookupJoin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment here for LookupJoin
, and remove the instanceof
check, I was attempting to make date_nanos
fields work as join keys, however it didn't workout easily for me. Thank you for creating a separate issue for it!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
private static Expression castInvalidMappedField(DataType targetType, FieldAttribute fa) { | ||
Source source = fa.source(); | ||
return switch (targetType) { | ||
case DATETIME -> new ToDatetime(source, fa); // in case we decided to use DATE as a common type instead of DATE_NANOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice as an example, but is effectively dead code. I wonder what are the chances we'll change our mind on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we're making a decision prioritizing the LogsDb plans. But could we be causing trouble for other users? Or is the thinking that other users need to add an explicit cast? That seems a reasonable assumption.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
// if so add a project with eval, so that a not null value can be returned for a union typed field | ||
if (logicalPlan.resolved()) { | ||
List<Attribute> output = logicalPlan.output(); | ||
Map<FieldAttribute, Alias> newAliases = Maps.newHashMapWithExpectedSize(output.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very similar to code above, but that also took into account previous/existing field mappings. Why is that not needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of castInvalidMappedFieldInFinalOutput
is to add implicit casting to mixed date
/date_nanos
fields, that are not referenced by the query(there was no chance to do implicit casting for them if they are not referenced in any commands), however they are in the returned resultsets. For example, from index*
. It only checks the top level plan's output, it doesn't traverse the entire tree, so it doesn't have a map to keep track of the casted fields map across all plans/subplans.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Eval.java
Outdated
Show resolved
Hide resolved
Discussed offline with @fang-xing-esql and @craigtaverner. I think the proposed solution should work, it's just a little complex given that it needs to replicate a) some logic present in InsertFieldExtraction (inserting EVALs for implicitly cast fields just before they're used) and b) for passing on fields that users never explicitly dropped or kept into the final output correctly. I suggested one more approach which we agreed to try before going forward with the Eval-injection approach. The alternative would turn invalid mapped fields of date/nanos type into multi typed fields per default, before any resolution step (multi typed fields in field attributes trigger automatic conversion on field extraction). I threw together a draft as a commit on top of this PR, which seems to still run most tests fine (all existing union type tests + most tests from this PR). This approach still needs to special case for the situation where users specifically don't want to convert into nanos, but dates, i.e. when the query has |
The other approach that converts |
Resolves #110009
The goal of this PR is to support union typed fields that reference both
date
anddate_nanos
fields without explicit casting. Union typed(mixed withdate
anddate_nanos
) fields are casted to a common data type during Analyzer phase automatically.numeric
orstring
typed fields are not within the scope of this PR.Today, queries that reference union(mixed with
date
anddate_nanos
) typed fields like below either return nulls or fail, because they reference union typed fields without explicit casting. After this PR, they should not fail or return null.A field can be referenced in nearly all the processing commands, and this PR should cover all the possible places a (union typed)field can appear: