Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions rust/otap-dataflow/docs/telemetry/events-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,9 @@ guide for naming:
attributes.
- Avoid synonyms that fragment cardinality across names (`finish` vs `complete`,
`error` vs `fail`). Pick one verb set and stick to it.
- Use **distinct event names** for different outcomes of the same operation
(e.g. `otlp.exporter.start.complete` and `otlp.exporter.start.fail`). Do not rely
solely on severity to distinguish success from failure.

More precisely, in this project, event names SHOULD follow this pattern:
`otelcol.<entity>[.<thing>].<verb>`
Expand Down Expand Up @@ -172,11 +175,10 @@ Optionally, add occurrence-specific attributes (dynamic context):

When events are exported as logs, set an appropriate severity.

Regarding severity, some events may be logged at different levels depending on
their severity or impact. For example, a `node.shutdown` event may be logged at
INFO level during a graceful shutdown, but at ERROR level if the shutdown is due
to a critical failure. When exporting events as logs, choose the log level that
best reflects the significance of the event.
Regarding severity, choose the log level that best reflects the significance of
the event. For example, `node.shutdown.complete` at INFO for a graceful
shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are
Comment on lines +179 to +180
Copy link
Copy Markdown
Member

@lmolkova lmolkova Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a formal guidance (at least not yet), but event naming follows the same principles as metric naming - one name for specific kind of event, outcome recorded as an attribute.

E.g. node.shutdown with attribute error.type = something_specific.

Caveat: assuming there are events along the way, during the process of shutdown, then there could be
node.shutdown.error event(s) and the process of shutdown can also be recorded as a span (or an event if spans don't work).
TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

Severity considerations (in review): open-telemetry/semantic-conventions#3311
TL;DR: if it's the end of the world situation - FATAL, if it clearly affects user experience - ERROR, if it's retriable and effect on user experience is minimal/not known - WARN or below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this @lmolkova. When looking at the OTel Collector's internal metrics, I see they use separate metric names for different outcomes:
otelcol_receiver_accepted_spans and otelcol_receiver_refused_spans
Rather than:
otelcol_receiver_spans{status="accepted|refused"}

Of course it is Collector, and while it was being built, semantic conventions may not have existed. Anyway, I can open an issue with sem.conv repo to continue this discussion in a better forum for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

This is quite reasonable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is an issue on collector - open-telemetry/opentelemetry-collector#14350. Having arrow interested in a similar set of metrics is a good justification to move the issue to semconv (or create a new one).

We've gone through a round of internal OTel SDK metric renames recently - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/otel/sdk-metrics.md and I imagine collector would do something similar at some point

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK and Collectors/pipelines can use the same metrics, but I've begun to see the SDK case as special compared with the pipeline case.

@lmolkova please see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md which I have been working to emulate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @lmolkova, for the clarification. We will incorporate this into our guidelines.

distinct events, not the same event at different severity levels.
Copy link
Copy Markdown
Contributor

@AaronRM AaronRM Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have a slight preference for the previous guidance (i.e. same event name, different level based on outcome) only because it can be easier to review in a downstream dashboard, etc...

When filtering by event name (e.g. node.shutdown) the user sees both the info and warning/error messages interleaved without needing to get into the internals of the event names.

Of course, the counter argument is that a fuzzy search on node.shutdown would yield the same results either way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-eventname
I will ask if OTel semantic conventions has thoughts on this as well. From what I can remember from old discussions, each occurrence of an Event should have same structure - but it does not explicitly say if Severity is part of the structure! (it does say attributes and body are)


## Stages

Expand Down
Loading