diff --git a/docs/how-to-write-conventions/README.md b/docs/how-to-write-conventions/README.md index 594da5f016..47686cf965 100644 --- a/docs/how-to-write-conventions/README.md +++ b/docs/how-to-write-conventions/README.md @@ -15,6 +15,11 @@ aliases: [/docs/specs/semconv/general/how-to-define-semantic-conventions] - [Defining attributes](#defining-attributes) - [Defining enum attribute members](#defining-enum-attribute-members) - [Defining spans](#defining-spans) + - [What operation does this span represent](#what-operation-does-this-span-represent) + - [Naming pattern](#naming-pattern) + - [Status](#status) + - [Kind](#kind) + - [Attributes](#attributes) - [Defining metrics](#defining-metrics) - [Defining entities](#defining-entities) - [Defining events](#defining-events) @@ -186,7 +191,135 @@ database. #### Defining spans -TBD +Spans describe the individual execution of a certain operation. + +Define spans when: + +- The corresponding operation is important for observability. +- The operation has duration. + +Don't define spans for point-in-time occurrences - use events instead. +Don't define spans for local-only operations, such as serialization or deserialization. + +For example, define spans for operations that involve one or more network calls. + +> [!NOTE] +> +> Known exception: [messaging `create`](/docs/messaging/messaging-spans.md#operation-types) span +> is defined for a local call. This is necessary when publishing batches of +> messages to ensure each message has a unique context and can be traced +> individually end-to-end. + +Don't define spans if there is an existing span definition that captures a very +similar operation. + +For example, a DB client span represents DB query execution from ORM or DB +driver perspectives. Both layers could be instrumented, but inner layers may be +suppressed to reduce duplication. + +> [!IMPORTANT] +> +> It's a common practice to accompany a span definition with a metric that measures +> the duration of the same operation. For example, the `http.client.request.duration` +> metric is recorded alongside the corresponding HTTP client span. + +A span definition should describe the operation it represents, include span kind, +naming pattern, considerations for setting span status, and the list of +attributes refined for that span definition. Let's cover each aspect. + +##### What operation does this span represent + +Define the scope and boundaries of the operation: + +- When the span starts and ends. +- If this span represents a client call, specify whether it captures the logical call + (as observed by the API caller) or the physical call (per-attempt). +- Define a different span for different operations - e.g., when spans have different + kinds or a significantly different set of attributes. + For example, HTTP client and server spans are two independent definitions. + Messaging publishing and receiving are also different span types. + +##### Naming pattern + +- Span names must have low cardinality and should provide a reasonable grouping + for that operation. See [Span name guidelines](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.52.0/specification/trace/api.md#span) for details. + +- Span names usually follow the `{action} {target}` pattern. For example, `send orders_queue`. + +- Span names should only include information that's available as span attributes. + I.e., `{action}` and `{target}` are usually also available as attributes and + are used on metrics describing that operation. + +- Static text should not be included but can be used as a fallback. + E.g., we use `GET /orders/{id}` instead of `HTTP GET /orders/{id}` for HTTP + server span names. + +- Provide fallback values in case some of the attributes used in the span name are not + available or could be problematic in edge cases (e.g., have high cardinality). + +- If a span name can become too long, recommend limits and truncation strategies + (e.g., DB conventions define a 255-character limit). + +##### Status + +Define what constitutes an error for that operation. + +If there are no special considerations, reference the [Recording errors](/docs/general//recording-errors.md) +document. + +Certain conditions can't be clearly classified as errors or not-errors (cancellations, +HTTP 404 and many others). Avoid using strict requirements - allow instrumentations +to leverage context they might have to provide a more accurate status. + +##### Kind + +All span definitions MUST include a specific span kind. One span definition can +only mention one span kind. + +##### Attributes + +Capture important details of the specific operation. Parent operations or sub-operations +will have their own spans. + +For example, when recording a call to upload a file to an object store, +include the endpoint, operation name (upload file), collection, and object identifier. Don't include details of +the underlying HTTP/gRPC requests unless there is a strong reason to do so. + +Only include attributes that bring clear value - this allows keeping telemetry +volume and performance overhead low. Don't try to capture all available details. +When in doubt, don't reference additional attributes - they can be added incrementally +based on feedback. + +Define which additional properties this span needs to be useful: + +- Include the `error.type` attribute. If the operation you're describing typically has a + domain-specific error code, include that as a separate attribute as well. + Document which error codes constitute an error. + +- Include `server.address` and `server.port` on client spans. + +- Include applicable `network.*` attributes on spans that describe network calls. + +- Include some form of operation name to describe the action being performed. + + For example, in the case of HTTP, it's `http.request.method`; in the case of RPC, + it's `rpc.method`; for messaging, `messaging.operation.name`; and for GenAI, `gen_ai.operation.name`. + This attribute typically serves as the `{action}` in the span name and may be used + across multiple span definitions within the same domain. + +- Identify other important characteristics such as the operation target (DB collection, + messaging queue, GenAI model, object store collection), input parameters, and + result properties that should be recorded on the span. + +- When referencing an attribute: + - Specify if an attribute is relevant for head-sampling. Such attributes should be + provided at start time so that they will be passed to the sampler. Usually, these are + attributes that have low cardinality and are easy to obtain. + - Specify [requirement level](/docs/general/attribute-requirement-level.md). + Only absolutely essential (and always available) attributes can be `required`. + Attributes that may include sensitive information, are expensive to obtain, + or are verbose, should be `opt-in`. + - Update the brief and note to tailor the attribute definition to that operation. #### Defining metrics