Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel trace groups #2

Open
wants to merge 7 commits into
base: otel-trace-groups
Choose a base branch
from

Conversation

KarstenSchnitter
Copy link

Description

Adjustment of the mapping templates for the standard OTel format.

Check List

  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kkondaka and others added 4 commits March 21, 2025 21:17
Provide dynamic templates for all attributes. Adds `ignore_above`
to all key word fields. Use reasonable defaults for OTel data.

Signed-off-by: Karsten Schnitter <[email protected]>
Provide dynamic templates for all attributes. Adds `ignore_above`
to all key word fields. Use reasonable defaults for OTel data.

Signed-off-by: Karsten Schnitter <[email protected]>
Provide dynamic templates for all attributes. Adds `ignore_above`
to all key word fields. Use reasonable defaults for OTel data.

Signed-off-by: Karsten Schnitter <[email protected]>
"dynamic_templates": [
{
"resource_attributes_map": {
"integer_resource_attributes": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named as long_resource_attributes and similarly below as well

Comment on lines 96 to 97
"droppedEventsCount": {
"type": "integer"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer is at 32bits but signed. The proto definition specifies an unsigned 32bit integer. Do you think this difference is significant? I find it hard to imagine there would be a scenario where that number of attributes is ever attached and dropped to a log/metric/span.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are just referring to the naming, right? I am going to correct that.

@@ -6,58 +6,129 @@
"_source": {
"enabled": true
},
"dynamic_templates": [
{
"integer_resource_attributes": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be long_resource_attributes
I think the question is already raised in other comments. can we address that in every signal type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIll do.

"droppedEventsCount": {
"type": "integer"
}
"droppedEventsCount": {
Copy link

@vamsimanohar vamsimanohar Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be droppedEventsCount?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. What do want to change? This is the name of the field. Only indentation seems to be off.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to change this to droppedAttributesCount.

"ignore_above": 256,
"type": "keyword"
"type": "keyword",
"ignore_above": 16

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we limiting to 16 chars here and 32 in traceId?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this is not a limitation of storage, just of the indexing. W3C Trace Context specifies the trace id to be an array of 16 bytes and the span id to be an array of 8 bytes. With hex-encoding this produces 32 and 16 characters respectively. Using B3 context would provide the same length. So it seems to be a reasonable choice.

}
"serviceName": {
"type": "alias",
"path": "resource.attributes.service.name"
},
"event": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need event here?
https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/logs/v1/logs.proto
We don't have anything in logs proto model.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a question I am wondering myself. This comes out of the Simple Schema for Observability. I kept it, since it was in the original file proposed by @kkondaka. I know this is not in the OpenTelemetry standard. The same is true for the data_stream. How do we want to go about this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I take it out in my PR to your branch.

@@ -6,58 +6,129 @@
"_source": {
"enabled": true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the field. I am going to take a look at the analysis.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KarstenSchnitter It looks like we need to bump up the version number to 3 (at least for spans). Otherwise new template mappings are not being sent to the OpenSearch server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this limitation come from?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we added this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this could look much nicer, if I could use component templates. That way, I could use the same dynamic mappings on all three templates. Is there a way to do that with Data Prepper?

},
"flags": {
"type": "integer"
"type": "long"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined as fixed32. Since this is interpreted as a bit field, I decided to use long to not change representation, when the highest bit is set. Do you prefer using integer, since only the lowest bits are currently in use?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to check, that for very attribute group there is a dropped attributes count. Thanks for pointing this out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed all the droppedEventsCount to droppedAttributesCount.

"type": "keyword",
"ignore_above": 2048
}
}
},
"stacktrace": {
"type": "text"
Copy link

@vamsimanohar vamsimanohar Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  ```
    "@timestamp": {
      "type": "date_nanos"
    },
    "time": {
      "type": "date_nanos"
    },
    "observedTimestamp": {
      "type": "date_nanos"
    },
    "observedTime": {
      "type": "alias",
      "path": "observedTimestamp"
    },
    ```


    We have four fields related to timestamp. 
    We have two in otel proto files, can we consolidate? If they are required why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in the original file. The OTel specification only guarantees, that at least one of time and observedTime is set. It is not telling which one. I think, @timestamp is supposed to always contain a timestamp for each message. Furthermore, it provides a common timestamp field for all three signals. The alias seems to be there for compatibility.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @timestamp is opensearch specific. Correct me if I am wrong. And observedTime is incorrect. I think observedTimestamp is correct. I do not think my new code generates "@timestamp" and "observedTime". The mappings have "observedTime" as alias for "observedTimestamp", which may be removed but I think it's ok to keep it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time vs Timestamp is essentially OpenTelemetry proto-files vs Logs Data Model. I am fine with either naming as long as it is consistent.

"properties": {
"droppedEventsCount": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropperAttributesCount. Also can you validate the existing mapping with
https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto

I see a lot of missing pieces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did some analysis here: opensearch-project/opensearch-catalog#218

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to do that.

- renaming of droppedEventsCount to droppedAttributesCount
- streamlined timestamps to include OTel proto defined times and on @timestamp for index ordering in OS
- removed datastream attributes which are covered by dynamic template anyways
- removed schemaUrl from logs and metrics but kept in scope and resource
- added details on exemplars, links and events

Signed-off-by: Karsten Schnitter <[email protected]>
@KarstenSchnitter
Copy link
Author

I addressed the comments by @ps48 and @vamsimanohar. There are some open questions:

  • What do we do about the events in the logs model? This is not part of the OTel Logs Data Model but in sso4. I kept the mappings, since they do not hurt when there is no data.
  • Timestamp handling: I kept a common @timestamp field in logs, metrics and traces. The idea is to have a clear timestamp to align on and ease analysis across the signals. For logs @timestamp should contain the optional time and observedTime if it is not available. For metrics @timestamp should contain the time. We can make it an alias, but for index patterns it would be better to be at least a copy_to. Since it needs custom logic in the logging case, it is not an option there.

}
},
{
"default_resource_attributes": {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KarstenSchnitter Is this correct? Default attributes can be objects also, right? It looks like, here you are assuming them to be strings. I think it is possible for attributes to be nested like

"resource": {
    "attributes" : {
          "data" : {
                 "key1": "value1"
                 "key2": 2
           },
           "key3": "value3"
      }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the path_match this is applied to all nested fields. With the three dynamic mappings your example would have the values of key1 indexed as keyword, of key2 as log, and of key3 as keyword.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KarstenSchnitter I am seeing this error with the mappings you provided -

WARN  org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - index = otel-v1-apm-span-000001, operation = Index, status = 400, error = object mapping for [resource.attributes.host.image.id] tried to parse field [host.image.id] as object, but found a concrete value

Any idea why this error is occurring?

The mappings from the server

dynamic_templates" : [
        {
          "long_resource_attributes" : {
            "path_match" : "resource.attributes.*",
            "mapping" : {
              "type" : "long"
            },
            "match_mapping_type" : "long"
          }
        },
        {
          "double_resource_attributes" : {
            "path_match" : "resource.attributes.*",
            "mapping" : {
              "type" : "double"
            },
            "match_mapping_type" : "double"
          }
        },
        {
          "default_resource_attributes" : {
            "path_match" : "resource.attributes.*",
            "mapping" : {
              "type" : "object"
            }
          }
        },
        

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the "default_resource_attributes", "default_scope_attributes" and "default_attributes" and then I was able to store the data and see the data using opensearch dashboards.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typical type mismatch in the error message. Note, that the default_resource_attributes is not what I provided. It should have type keyword. Do you have a procedure to reproduce this issue?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KarstenSchnitter sorry, that was copy-paste mistake. Even with the correct mappings with "keyword", I am still seeing the error. For now, I am going to take out the default attributes section in all three and updated the PR. These changes are working fine for traces,logs and metrics. Please submit another PR separately (on top of this) for any changes you want to make after I merge these change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not take it out. Mapping to keywords is the essential default we need. Can you try with an additional "match_mapping_type": "string" instead? I have not seen this problem during my testing. Can you provide the events you were testing with?

}
},
{
"default_attributes": {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. General attributes may also have nested attributes, right?

}
"serviceName": {
"type": "alias",
"path": "resource.attributes.service.name"
},
"event": {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this.

"type": "keyword",
"ignore_above": 2048
}
}
},
"stacktrace": {
"type": "text"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @timestamp is opensearch specific. Correct me if I am wrong. And observedTime is incorrect. I think observedTimestamp is correct. I do not think my new code generates "@timestamp" and "observedTime". The mappings have "observedTime" as alias for "observedTimestamp", which may be removed but I think it's ok to keep it.

@@ -6,58 +6,129 @@
"_source": {
"enabled": true
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KarstenSchnitter It looks like we need to bump up the version number to 3 (at least for spans). Otherwise new template mappings are not being sent to the OpenSearch server.

Events are not part of the OTel Logs Data Model [1]. The
mapping for the SS4O fields has been removed.

[1] https://opentelemetry.io/docs/specs/otel/logs/data-model/

Signed-off-by: Karsten Schnitter <[email protected]>
Copy link
Author

@KarstenSchnitter KarstenSchnitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the event field from the logs mapping and answered some of the review comments. Timestamp names remain an open discussion.

}
},
{
"default_resource_attributes": {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the path_match this is applied to all nested fields. With the three dynamic mappings your example would have the values of key1 indexed as keyword, of key2 as log, and of key3 as keyword.

}
"serviceName": {
"type": "alias",
"path": "resource.attributes.service.name"
},
"event": {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I take it out in my PR to your branch.

"type": "keyword",
"ignore_above": 2048
}
}
},
"stacktrace": {
"type": "text"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time vs Timestamp is essentially OpenTelemetry proto-files vs Logs Data Model. I am fine with either naming as long as it is consistent.

},
"flags": {
"type": "integer"
"type": "long"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed all the droppedEventsCount to droppedAttributesCount.

@@ -6,58 +6,129 @@
"_source": {
"enabled": true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this limitation come from?

@kkondaka
Copy link
Owner

kkondaka commented Apr 5, 2025

@KarstenSchnitter ignore my comment about version being 3. Will keep it as 1. If there is an existing template with version 1 already present, it needs to be removed. Otherwise, we have to bump this version 2.

@kkondaka kkondaka force-pushed the otel-trace-groups branch from ecdd704 to 076047f Compare April 7, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants