-
Notifications
You must be signed in to change notification settings - Fork 91
Clarify event naming for different outcomes #2010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a8dd3d4
afd6199
170c640
79b0e7a
a4e6057
21f51b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -132,6 +132,9 @@ guide for naming: | |
| attributes. | ||
| - Avoid synonyms that fragment cardinality across names (`finish` vs `complete`, | ||
| `error` vs `fail`). Pick one verb set and stick to it. | ||
| - Use **distinct event names** for different outcomes of the same operation | ||
| (e.g. `otlp.exporter.start.complete` and `otlp.exporter.start.fail`). Do not rely | ||
| solely on severity to distinguish success from failure. | ||
|
|
||
| More precisely, in this project, event names SHOULD follow this pattern: | ||
| `otelcol.<entity>[.<thing>].<verb>` | ||
|
|
@@ -172,11 +175,10 @@ Optionally, add occurrence-specific attributes (dynamic context): | |
|
|
||
| When events are exported as logs, set an appropriate severity. | ||
|
|
||
| Regarding severity, some events may be logged at different levels depending on | ||
| their severity or impact. For example, a `node.shutdown` event may be logged at | ||
| INFO level during a graceful shutdown, but at ERROR level if the shutdown is due | ||
| to a critical failure. When exporting events as logs, choose the log level that | ||
| best reflects the significance of the event. | ||
| Regarding severity, choose the log level that best reflects the significance of | ||
| the event. For example, `node.shutdown.complete` at INFO for a graceful | ||
| shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are | ||
| distinct events, not the same event at different severity levels. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I have a slight preference for the previous guidance (i.e. same event name, different level based on outcome) only because it can be easier to review in a downstream dashboard, etc... When filtering by event name (e.g. Of course, the counter argument is that a fuzzy search on
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-eventname |
||
|
|
||
| ## Stages | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a formal guidance (at least not yet), but event naming follows the same principles as metric naming - one name for specific kind of event, outcome recorded as an attribute.
E.g.
node.shutdownwith attributeerror.type = something_specific.Caveat: assuming there are events along the way, during the process of shutdown, then there could be
node.shutdown.errorevent(s) and the process of shutdown can also be recorded as a span (or an event if spans don't work).TL;DR:
node.shutdownevent represents the shutdown process with any outcome (success, failure, something in betwwen).node.shutdown.errorrepresents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.Severity considerations (in review): open-telemetry/semantic-conventions#3311
TL;DR: if it's the end of the world situation - FATAL, if it clearly affects user experience - ERROR, if it's retriable and effect on user experience is minimal/not known - WARN or below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this @lmolkova. When looking at the OTel Collector's internal metrics, I see they use separate metric names for different outcomes:
otelcol_receiver_accepted_spans and otelcol_receiver_refused_spans
Rather than:
otelcol_receiver_spans{status="accepted|refused"}
Of course it is Collector, and while it was being built, semantic conventions may not have existed. Anyway, I can open an issue with sem.conv repo to continue this discussion in a better forum for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is an issue on collector - open-telemetry/opentelemetry-collector#14350. Having arrow interested in a similar set of metrics is a good justification to move the issue to semconv (or create a new one).
We've gone through a round of internal OTel SDK metric renames recently - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/otel/sdk-metrics.md and I imagine collector would do something similar at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK and Collectors/pipelines can use the same metrics, but I've begun to see the SDK case as special compared with the pipeline case.
@lmolkova please see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md which I have been working to emulate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @lmolkova, for the clarification. We will incorporate this into our guidelines.