Description
Is your feature request related to a problem? Please describe
- When users create indices, they may choose to provide specific data types for fields (through mappings), such as date, scaled_float, etc. or for the documents they ingest, the field types may be auto-inferred using the dynamic mappings feature of OpenSearch.
- Users may associate analyzers and tokenizers with their fields which are then applied on the field values during indexing flow.
These features are helpful to tune what kind of output to get from the index in terms of query support, they may add extra processing to the cluster, which in turn may have unintended impact on performance. It would help to get visibility into the execution of these flows and measure their overhead, and determine performance impact due to these. It would also help users to see if they can change optimize mappings (e.g. text vs match_only_text vs keyword)/analysers (e.g. regex simplification, n-gram redcution)
With adding insights around mappings, analyzers, and tokenizers, we should be able to get a granular view around the time taken to perform various operations while trying to index a document. Request tracing, and newly added metrics framework provide a good interface to enable this, and adding observing decorators on top of this should help to solve for this.
Describe the solution you'd like
Use Telemetry framework to emit metrics. Few of the proposed metrics are (not exhaustive):
- Mapping Count by Data Type
- Analyzer Count
- Time taken by an analyzer implementation
Related component
Indexing:Performance
Describe alternatives you've considered
No response
Additional context
No response