Skip to content

Add support for time, byte, and percent units in NumberFieldMapper #104037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Jan 8, 2024

Adding units to a NumberFieldMapper also adds new query capabilities to the field. For example, when annotating a long field with unit: us, you can do term and range queries using notations like 1s, which will automatically convert that into microseconds.

In this PR, I've leveraged the existing meta field and the convention for the unit metadata. However, this metadata is not opaque to Elasticsearch anymore. So ideally, I think that the unit should be a proper parameter for numeric field types. However, this will have all sorts of implications and complications for the existing usages of the unit field that's already adopted in our integrations (although I couldn't figure out if Kibana is actually making use of meta.unit yet). As we'd want to expose the unit in field caps, this would also require some more work and boilerplate so I didn't want to implement that before discussing this further.

This PR is also aligned with the concept of units in OpenTelemetry metrics that defines the format of unit to be a UCUM symbol. Using a well-established standard makes a lot of sense to me and increases interoperability. For compatibility reasons, this PR also supports the previously documented unit identifiers.

I've leveraged existing types used for settings to parse time, byte, and percent values with unit suffixes. This works well for now but these classes impose certain restrictions that may be limiting for this use case. For example, TimeValue doesn't negative or fractional numbers and RatioValue doesn't support values above 100%. These limitations are probably fine for now but we might want to do something about that down the line, at which point the question is whether these should be completely separate classes dedicated to the settings and metric query use cases, respectively, or share common functionality.

Closes #65432 to support querying for us in addition to micros. As this inconsistency (micros vs ms) is now exposed to users via queries, I think the importance of consistency is higher now.
Closes #31244 even though this doesn't add dedicated field types but rather relies on the unit for numeric field types.

@felixbarny felixbarny added :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics labels Jan 8, 2024
Copy link
Contributor

github-actions bot commented Jan 8, 2024

Documentation preview:

@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team v8.13.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jan 8, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@felixbarny
Copy link
Member Author

This probably also needs adjustments for ES|QL to be able to leverage the unit.
Another limitation is that runtime fields don't support queries with units currently. This can be added later if needed.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jan 8, 2024

@felixbarny A smart approach to incorporate _meta.unit for units!
I have a couple of questions for your PR

  • What is your strategy for indices with existing _meta.unit field? Should we convert queries only for indices from the certain index version?
  • For new indices, if a user wants to keep existing behaviour (having_meta.unit but doesn't want to do the query conversion), what would be a way to achieve this?
  • Are units applicable only for Long type? It looks like you are applying them for queries on all numeric types.

@felixbarny
Copy link
Member Author

felixbarny commented Jan 8, 2024

Hey Mayya, thanks a lot for your comment.

What is your strategy for indices with existing _meta.unit field? Should we convert queries only for indices from the certain index version?

The meta.unit field is still completely free form and there's no harm to have, for example meta.unit: "foo". If we don't know the unit, this PR still has the same behavior as main. So I think we don't need to do something special for existing indices. Some existing indices may use well-known unit symbols so I think it would be confusing if the query unit conversion didn't work for existing indices.

For new indices, if a user wants to keep existing behaviour (having_meta.unit but doesn't want to do the query conversion), what would be a way to achieve this?

I'm not sure if we need to have a mode for that as what's added here is strictly additional functionality rather than altering existing behavior. You can still do queries without units. So if you specify the unit us, you can still do a completely numeric range query. Before the PR, typing in 1s would fail as it can't be parsed to a number. With these changes, we're still accepting numeric values and only attempt to parse strings that have a unit suffix (See UnitOfMeasurement$AbstractUnitOfMeasurement#tryConvert).

Are units applicable only for Long type? It looks like you are applying them for queries on all numeric types.

No, units can also be used for other types. For example, you might want to map a duration unit, such as ms to integer if you're certain that it's sufficient for your use case. For the percent unit, it's most useful to use double or float types. But I don't see a need to be restrictive about which combinations of numeric field types and units are seen as valid or invalid.

@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

@felixbarny
Copy link
Member Author

@jpountz I'd be curious on your thoughts about using meta.unit vs using a dedicated unit parameter.

I think using meta.unit is a pragmatic solution that creates the least amount of friction. However, since it's not completely opaque to ES anymore it feels like we'd probably add a dedicated unit parameter if we were to do this from the ground up again.

@axw
Copy link
Member

axw commented Jan 10, 2024

@felixbarny just a reminder, in case you had forgotten: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#timeseries-model states that unit is part of an OTel metric's time series identity. Two metrics with the same name but different units should be considered two different time series.

So should we also consider alternatives where unit is defined per document, rather than defined in the mappings? Alternatively, this could be another extension of passthrough, where unit is encoded in the field name.

@felixbarny
Copy link
Member Author

Oh, you're right, I didn't think of that. That'll make it tricky.

So should we also consider alternatives where unit is defined per document, rather than defined in the mappings?

If we don't have the information about the unit in the mappings we also can't use it for field caps which Kibana and ES|QL depend on. Also, the parsing and conversion for term and range queries of a particular field can't take into account field metadata but has to first load actual data or even do a conversion per document. Units may also be used outside of TSDB, for example for logs and traces, so we can't rely on some metadata store for time series, either.

Alternatively, this could be another extension of passthrough, where unit is encoded in the field name.

To check my understanding of what you're proposing - if you have a metric foo with the unit us, you're proposing we could store it as us.foo or foo.us but then make it so that you can still query for foo, right? I suppose that could maybe work. We'd then also be able to attach the unit metadata into the mappings. But we still need a way to define the unit as mapping metadata in the _bulk request. Maybe with something what you've been proposing here (#72536 (comment)), or directly from the object structure, like metrics.<unit>.<metric_name> where we take out the <unit> part and somehow shove it into meta.unit: <unit> of the leaf field.

Overall, it seems like we do need to have the unit information available in the mapping so there's no contradiction with the OTel spec that this PR introduces. But we'll need to find ways how to dynamically add unit metadata, and how to encode the unit in the object hierarchy, while making that transparent to users. That probably means to manipulate field caps responses in a way that hides the physical layout (metrics.<unit>.<metric_name>) and only exposes the logical layout (<metric_name>).

It still feels a little weird to allow the same (logical) field name to contain values in different units. Imagine an extreme case of a request_info metric where one time series contains a time duration, and another time series contains bytes 🤯.

@axw
Copy link
Member

axw commented Jan 10, 2024

To check my understanding of what you're proposing - if you have a metric foo with the unit us, you're proposing we could store it as us.foo or foo.us but then make it so that you can still query for foo, right?

Right.

I suppose that could maybe work. We'd then also be able to attach the unit metadata into the mappings. But we still need a way to define the unit as mapping metadata in the _bulk request. Maybe with something what you've been proposing here (#72536 (comment)), or directly from the object structure, like metrics..<metric_name> where we take out the part and somehow shove it into meta.unit: of the leaf field.

Yes, either way. Anyway, I think I'm also convinced that it should be part of the mapping, and we can work that out separately.

@jpountz
Copy link
Contributor

jpountz commented Jan 10, 2024

Intuitively I would expect support for units to work a bit more like date fields, where data is always stored with the same granularity internally, e.g. ms for date and ns for date_nanos, and then both the indexing side and the query side translate from whatever granularity is provided to the one that is used internally. It seems like an argument for the approach you took is bw compat, but I wonder if there are other ways we could make queries compatible across numeric fields and new byte-size or duration fields, maybe we can take inspiration from how we handle compatibility between date and date_nanos fields?

@felixbarny
Copy link
Member Author

data is always stored with the same granularity internally

I don't think that would be compatible with how units work in OpenTelemetry and also with how meta.unit works today. The unit describes both the unit in which numeric data is sent and stored. There's no option to send data, for example as 1ms but then internally store it as 1000 if the unit is us. Similarly, we'll just return the plain numeric value in search and aggregation responses rather than a formatted string. This is to be fully compatible with the numeric field type. The unit is most useful when doing queries or for the UI to know about the type automatically, so that it can format the numeric values using, for example a byte or time scale.

@jpountz
Copy link
Contributor

jpountz commented Jan 11, 2024

Thanks, that makes sense.

@jpountz I'd be curious on your thoughts about using meta.unit vs using a dedicated unit parameter.

This is a flag that is only used to parse queries, in a similar way to how we'd expect Kibana to take advantage of it. And if Kibana can trust the meta object, then I don't see why Elasticsearch couldn't. So I think that reusing the existing meta object is fine. FWIW I may have a different answer if we were considering making different index-time decisions, or formatting data differently in search responses, based on this parameter, so I'm curious to double check whether we envision this change to be the end of the road, or if there will be appetite for having more native support for byte sizes and durations in the future.

@felixbarny
Copy link
Member Author

FWIW I may have a different answer if we were considering making different index-time decisions, or formatting data differently in search responses, based on this parameter

I don't really see us ever wanting to do that.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine added v8.16.0 Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed v8.15.0 Team:Search Meta label for search team labels Jul 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team >feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:StorageEngine v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use SI units in TimeValue Add dedicated field types for durations and byte sizes
6 participants