Skip to content
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ed8f108
Refine recording errors documentation towards Span Event API deprecat…
pellared Dec 19, 2025
d96eedf
add chlog entry
pellared Dec 19, 2025
13f95d2
refine chlog
pellared Dec 19, 2025
0baed85
Fix grammar and clarify logging recommendations in recording errors d…
pellared Dec 19, 2025
6b5bc87
Update links to OpenTelemetry specification for version 1.52.0 in rec…
pellared Dec 19, 2025
6390bfd
Apply feedback
pellared Dec 19, 2025
b352235
Refine recording errors section
pellared Dec 19, 2025
b8ca828
Clarify wording on optional error event records in recording errors d…
pellared Dec 19, 2025
023348a
Update recording-errors.md
pellared Dec 19, 2025
00d8769
link to event records
pellared Dec 22, 2025
02744ca
logs to use error.type as other singals
pellared Jan 5, 2026
aa38670
example for operation failure in logging recommendations
pellared Jan 5, 2026
1be560d
refine handling of retried or handled errors in logging recommendations
pellared Jan 5, 2026
36405a0
simplify error recording guidelines
pellared Jan 7, 2026
060846c
clarify failed operation definition and update span status requirements
pellared Jan 7, 2026
6e232d4
refine definition of failed operations and update error handling guid…
pellared Jan 7, 2026
521e340
Update docs/general/recording-errors.md
pellared Jan 7, 2026
14f21c6
clarify error handling in span recording and adjust severity recommen…
pellared Jan 7, 2026
c07f05c
Merge branch 'recording-errors' of github.com:pellared/semantic-conve…
pellared Jan 7, 2026
266c084
refine error handling guidelines by removing the failed operation sec…
pellared Jan 7, 2026
c52e846
remove guidance on recording retried errors in metrics
pellared Jan 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .chloggen/3228.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: general

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Refine recording errors documentation to include logs and avoid span events.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [3228]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
141 changes: 68 additions & 73 deletions docs/general/recording-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,26 @@
<!-- toc -->

- [What constitutes an error](#what-constitutes-an-error)
- [What constitutes a failed operation](#what-constitutes-a-failed-operation)
- [Recording errors](#recording-errors)
- [Recording errors on spans](#recording-errors-on-spans)
- [Recording errors on metrics](#recording-errors-on-metrics)
- [Recording exceptions](#recording-exceptions)
- [Recording errors on logs](#recording-errors-on-logs)

<!-- tocstop -->

This document provides recommendations to semantic convention and instrumentation authors
on how to record errors on spans and metrics.
This document provides recommendations to semantic convention
and instrumentation authors on how to record errors on spans, metrics, and logs.

Individual semantic conventions are encouraged to provide additional guidance.

## What constitutes an error

An operation SHOULD be considered as failed if any of the following is true:
In the scope of this document, an error occurs when:

- an exception is thrown by the instrumented method (API, block of code, or another instrumented unit)
- the instrumented method returns an error in another way, for example, via an error code

Semantic conventions that define domain-specific status codes SHOULD specify
which status codes should be reported as errors by a general-purpose instrumentation.
- an exception is thrown by an instrumented operation,
- the instrumented operation returns an error in another way,
for example, via an error object or status code.

> [!NOTE]
>
Expand All @@ -33,38 +33,43 @@ An operation SHOULD be considered as failed if any of the following is true:
> expected the resource to be available. However, it is not an error when the
> application is simply checking whether the resource exists.
>
> Instrumentations that have additional context about a specific request MAY use
> this context to set the span status more precisely.
> Instrumentations that have additional context about a specific request SHOULD
> use this context to classify whether the status code is an error.

Errors that were retried or handled (allowing an operation to complete gracefully) SHOULD NOT
be recorded on spans or metrics that describe this operation.
## What constitutes a failed operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does it need a separate section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar section for defining "error". I think that the "failed operations" is a very important term used in this document that deserves also being defined. I can refactor it to a single section like "Definitions used in this document". However, I would prefer doing such structural changes in a separate PR.

Copy link
Member Author

@pellared pellared Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the section after changes.

I removed the whole section so that we can stabilize each section independently. Also as noted #3228 (comment) there are cases when we want to add attribute to spans even if the operation has not (semantically) failed. The metrics already has

Operations that complete successfully SHOULD NOT include the error.type attribute,
allowing users to filter out errors.


## Recording errors on spans
An operation SHOULD be considered as failed when it ends with an error.

[Span Status Code][SpanStatus] MUST be left unset if the instrumented operation has
ended without any errors.
Errors that were retried or handled (allowing an operation to complete gracefully)
SHOULD NOT be recorded on spans or metrics that describe this operation.

When the operation ends with an error, instrumentation:
## Recording errors

- SHOULD set the span status code to `Error`
- SHOULD set the [`error.type`](/docs/registry/attributes/error.md#error-type) attribute
- SHOULD set the span status description when it has additional information
about the error which is not expected to contain sensitive details and aligns
with [Span Status Description][SpanStatus] definition.
Instrumentation SHOULD ensure that, for a given error, the same value is
used as the [`error.type`][ErrorType] attribute on spans and metrics, and as
[`EventName`][EventName] on logs.

## Recording errors on spans

When the instrumented operation failed, the instrumentation:

It's NOT RECOMMENDED to duplicate status code or `error.type` in span status description.
- SHOULD set the span status code to `Error`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the definitions above, this would happen any time the span ends with an exception thrown by the instrumented operation. But the instrumentation may know that the exception doesn't actually represent an error, e.g. that the exception is expected to be handled outside the span (see pydantic/logfire#1361 for example).

We still want to be able to record info about these kinds of exceptions, including a traceback. But they shouldn't be marked as errors. That means that there's also no place here to store the exception message, since the span status description can't be set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL 6e232d4

- SHOULD set the [`error.type`][ErrorType] attribute,
- SHOULD set the span status description when it has additional information
about the error that aligns with [Span Status Description][SpanStatus]
definition, for example, an exception message.

When the operation fails with an exception, the span status description SHOULD be set to
the exception message.
Note that [Span Status Code][SpanStatus] MUST be left unset if the instrumented
operation has ended without any errors.

Refer to the [recording exceptions](#recording-exceptions) on capturing exception
details.
It is NOT RECOMMENDED to record the error via a span event,
for example, by using [`Span.RecordException`][RecordException].

## Recording errors on metrics

Semantic conventions for operations usually define an operation duration histogram
metric. This metric SHOULD include the `error.type` attribute. This enables users to derive
throughput and error rates.
metric. This metric SHOULD include the [`error.type`][ErrorType] attribute.
This enables users to derive throughput and error rates.

Operations that complete successfully SHOULD NOT include the `error.type` attribute,
allowing users to filter out errors.
Expand All @@ -76,50 +81,40 @@ messaging operation may involve sending multiple messages) and includes `error.t
It's RECOMMENDED to report one metric that includes successes and failures as opposed
to reporting two (or more) metrics depending on the operation status.

Instrumentation SHOULD ensure `error.type` is applied consistently across spans
and metrics when both are reported. A span and its corresponding metric for a single
operation SHOULD have the same `error.type` value if the operation failed and SHOULD NOT
include it if the operation succeeded.

## Recording exceptions

When an instrumented operation fails with an exception, instrumentation SHOULD record
this exception as a [span event](/docs/exceptions/exceptions-spans.md) or a [log record](/docs/exceptions/exceptions-logs.md).

It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance
instead of providing individual attributes. This enables the OpenTelemetry SDK to
control what information is recorded based on application configuration.

It's NOT RECOMMENDED to record the same exception more than once.
It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library.

For example, in this code-snippet, `ResourceAlreadyExistsException` is handled and the corresponding
native instrumentation should not record it. Exceptions which are propagated
to the caller should be recorded (or logged) once.

```java
public boolean createIfNotExists(String resourceId) throws IOException {
Span span = startSpan();
try {
create(resourceId);
return true;
} catch (ResourceAlreadyExistsException e) {
// not recording exception and not setting span status to error - exception is handled
// but we can set attributes that capture additional details
span.setAttribute(AttributeKey.stringKey("acme.resource.create.status"), "already_exists");
return false;
} catch (IOException e) {
// recording exception here (assuming it was not recorded inside `create` method)
span.recordException(e);
// or
// logger.warn(e);

span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName())
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
}
}
```
## Recording errors on logs

When recording an error using logs:

- MUST set [`EventName`][EventName] with a value that would normally be
used for an [`error.type`][ErrorType] attribute.
- SHOULD set [`error.message`][ErrorMessage] attribute to add additional
information about the error, for example, an exception message.

When an error is retried or handled and the overall operation completes successfully,
it SHOULD still be recorded as an event record for diagnostic purposes.
In such scenario, [`SeverityNumber`][SeverityNumber] MUST be below 17 (ERROR).

When an error occurs outside the context of any span
and it causes an operation to fail,
the instrumentation SHOULD record it as an event record.
In such scenario, [`SeverityNumber`][SeverityNumber] MUST be greater than
or equal to 17 (ERROR).

When an error occurs inside the context of a span
and it causes an operation to fail,
the instrumentation SHOULD NOT additionally record it as an event record.

> [!NOTE]
>
> Applications that also want error event records corresponding to spans
> that already record errors can use a span processor (or equivalent component)
> that emits error logs for such spans. This is an optional, user-configured
> mechanism and is not required by these conventions.

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.52.0/specification/trace/api.md#set-status
[RecordException]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.52.0/specification/trace/api.md#record-exception
[ErrorType]: /docs/registry/attributes/error.md#error-type
[ErrorMessage]: /docs/registry/attributes/error.md#error-message
[EventName]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.52.0/specification/logs/data-model.md#field-eventname
[SeverityNumber]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.52.0/specification/logs/data-model.md#field-severitynumber
Loading