Skip to content

Clarify event naming for different outcomes#2010

Merged
jmacd merged 6 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/tuneguidance
Feb 23, 2026
Merged

Clarify event naming for different outcomes#2010
jmacd merged 6 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/tuneguidance

Conversation

@cijothomas
Copy link
Copy Markdown
Member

@cijothomas cijothomas commented Feb 10, 2026

The "Severity and placement" section previously suggested using the same event name (node.shutdown) at different severity levels to distinguish a graceful shutdown (INFO) from a critical failure (ERROR).

This conflicts with the guidance in Event Naming and Verbs, which recommends distinct event names for different outcomes. Updated the example to use node.shutdown.complete (INFO) and node.shutdown.fail (ERROR) so the guide is internally consistent.

This is an intentional semantic change, not just a wording tweak. The guide now consistently says: different outcomes → different event names. Severity reflects significance, but should not be the sole way to distinguish success from failure.

Fixes #1972

@github-actions github-actions Bot added the rust Pull requests that update Rust code label Feb 10, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.42%. Comparing base (70c62ad) to head (21f51b7).
⚠️ Report is 42 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2010      +/-   ##
==========================================
- Coverage   86.42%   86.42%   -0.01%     
==========================================
  Files         525      525              
  Lines      167144   167144              
==========================================
- Hits       144457   144450       -7     
- Misses      22153    22160       +7     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 88.43% <ø> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.30% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
quiver 92.15% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cijothomas cijothomas changed the title Tweak EventName guidance Clarify event naming for different outcomes Feb 10, 2026
@cijothomas cijothomas marked this pull request as ready for review February 10, 2026 17:43
@cijothomas cijothomas requested a review from a team as a code owner February 10, 2026 17:43
@cijothomas
Copy link
Copy Markdown
Member Author

@lquerel @albertlockett Requesting more eyes for this.

Regarding severity, choose the log level that best reflects the significance of
the event. For example, `node.shutdown.complete` at INFO for a graceful
shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are
distinct events, not the same event at different severity levels.
Copy link
Copy Markdown
Contributor

@AaronRM AaronRM Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have a slight preference for the previous guidance (i.e. same event name, different level based on outcome) only because it can be easier to review in a downstream dashboard, etc...

When filtering by event name (e.g. node.shutdown) the user sees both the info and warning/error messages interleaved without needing to get into the internals of the event names.

Of course, the counter argument is that a fuzzy search on node.shutdown would yield the same results either way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-eventname
I will ask if OTel semantic conventions has thoughts on this as well. From what I can remember from old discussions, each occurrence of an Event should have same structure - but it does not explicitly say if Severity is part of the structure! (it does say attributes and body are)

Copy link
Copy Markdown
Member

@albertlockett albertlockett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll readily admit that I do not know what the best practices here!

The OTel data model says the event name identifies the "class / type of event", and I suppose it's conceivable that events of the same type/class could be more/less severe depending on the context in which they occurred.

Also, does this example not still conflict with the guidance in event naming?

## Event naming
Event names MUST be low-cardinality and stable. Follow the semantic conventions
guide for naming:
- Lowercase and dot-separated. It identifies a class of event, not an instance.
- Keep the name stable and "type-like". Treat it as a schema identifier.
- Use verbs for actions (e.g. `pipeline.config.reload`).
- Avoid embedding IDs or dynamic values in the name. Encode variability as
attributes.
- Avoid synonyms that fragment cardinality across names (`finish` vs `complete`,
`error` vs `fail`). Pick one verb set and stick to it.
More precisely, in this project, event names SHOULD follow this pattern:
`otelcol.<entity>[.<thing>].<verb>`
Where:
- `otelcol.` is the project prefix/namespace used for events and other custom
telemetry.
- `<entity>` is the primary entity involved (e.g. `pipeline`, `node`,
`channel`). See the [entity model](entity-model.md) for the list of entities.
- `<thing>` is an optional sub-entity, subject, or stage (e.g. `build`, `run`,
`receiver`, `exporter`).
- `<verb>` is the action or occurrence (e.g. `start`, `complete`, `fail`,
`reload`, `shutdown`).

node.shutdown.complete is of the form <entity>.<verb>.<verb> isn't it? Or is shutdown now the "thing"?

@jmacd
Copy link
Copy Markdown
Contributor

jmacd commented Feb 10, 2026

I'm so, so very tempted to share my opinion, but then I realized it's wasteful of all of our time. I agree with Cijo we should follow OTel and there is a nuanced question about whether Level is considered independent of event identity to be addressed. I approve if @AaronRM and @cijothomas and others agree.

@AaronRM
Copy link
Copy Markdown
Contributor

AaronRM commented Feb 10, 2026

I'm so, so very tempted to share my opinion, but then I realized it's wasteful of all of our time. I agree with Cijo we should follow OTel and there is a nuanced question about whether Level is considered independent of event identity to be addressed. I approve if @AaronRM and @cijothomas and others agree.

I agree. 👍 My main concern is that the guidelines are consistent to know how to address comments properly in #1988.

@cijothomas
Copy link
Copy Markdown
Member Author

node.shutdown.complete is of the form .. isn't it? Or is shutdown now the "thing"?

I'll follow-up separately for this one.

Only goal of this PR is to clarify if we should use same EventName for different outcomes of an operation.

One event:
process.shutdown with attribute status=Sucess/Fail , severity depends on status

vs

process.shutdown.ok , INFO
process.shutdown.failed WARN/ERROR attributes reason, error_code, ....

We currently have former, and this PR is changing to the latter.

Both approaches are valid, but distinct event names better reflect that a successful shutdown and a failed shutdown are fundamentally different event types with different operational significance and mental models—success is a routine lifecycle transition while failure is an anomalous condition requiring investigation.

This could be a time consuming discussion without much returns! Lets give it another day to see if anyone has strong preference either way.

@lquerel
Copy link
Copy Markdown
Contributor

lquerel commented Feb 12, 2026

There are indeed a few ambiguities in our guidelines around events that need clarification.

I believe the semantic conventions lean toward Approach 1, but I'm adding @lmolkova and @jsuereth (semconv and weaver maintainers) to get their perspective on this topic.

Concretely, my thinking is that the event name would be:

<ns>.node.shutdown

where node represents the thing, and the attributes would include:

<ns>.shutdown.outcome = "success" | "failure"
error.type            # when severity is ERROR or WARN
error.message.   # when severity is ERROR or WARN

I've intentionally left the <ns> placeholder undefined for now, since there is still some discussion about which namespace we should use for this project (otelcol or something else).

@cijothomas
Copy link
Copy Markdown
Member Author

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/OpenTelemetryProtocolExporterEventSource.cs#L98-L114
In OTel .NET's internal logging, we follow Approach 2 - normal export vs failed export are distinct events. Not saying that is the best, but it has certainly influenced my proposal. Here's why:
Separately EventName would help when we eventually use weaver like tools to auto-generate code for reporting each event. With distinct event name, it'll be easier to enforce things - node.shutdown.fail event's generated code can enforce passing an error.type.

@lquerel
Copy link
Copy Markdown
Contributor

lquerel commented Feb 12, 2026

@cijothomas

Here's why: Separately EventName would help when we eventually use weaver like tools to auto-generate code for reporting each event. With distinct event name, it'll be easier to enforce things - node.shutdown.fail event's generated code can enforce passing an error.type.

An interesting argument to take into consideration.

@jmacd
Copy link
Copy Markdown
Contributor

jmacd commented Feb 12, 2026

I think we should merge this. I do not think we should wait for the philosophical question or OTel to change its mind.
Mostly, it's because I want #1988 to merge!

Comment on lines +179 to +180
the event. For example, `node.shutdown.complete` at INFO for a graceful
shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are
Copy link
Copy Markdown
Member

@lmolkova lmolkova Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a formal guidance (at least not yet), but event naming follows the same principles as metric naming - one name for specific kind of event, outcome recorded as an attribute.

E.g. node.shutdown with attribute error.type = something_specific.

Caveat: assuming there are events along the way, during the process of shutdown, then there could be
node.shutdown.error event(s) and the process of shutdown can also be recorded as a span (or an event if spans don't work).
TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

Severity considerations (in review): open-telemetry/semantic-conventions#3311
TL;DR: if it's the end of the world situation - FATAL, if it clearly affects user experience - ERROR, if it's retriable and effect on user experience is minimal/not known - WARN or below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this @lmolkova. When looking at the OTel Collector's internal metrics, I see they use separate metric names for different outcomes:
otelcol_receiver_accepted_spans and otelcol_receiver_refused_spans
Rather than:
otelcol_receiver_spans{status="accepted|refused"}

Of course it is Collector, and while it was being built, semantic conventions may not have existed. Anyway, I can open an issue with sem.conv repo to continue this discussion in a better forum for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

This is quite reasonable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is an issue on collector - open-telemetry/opentelemetry-collector#14350. Having arrow interested in a similar set of metrics is a good justification to move the issue to semconv (or create a new one).

We've gone through a round of internal OTel SDK metric renames recently - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/otel/sdk-metrics.md and I imagine collector would do something similar at some point

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK and Collectors/pipelines can use the same metrics, but I've begun to see the SDK case as special compared with the pipeline case.

@lmolkova please see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md which I have been working to emulate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @lmolkova, for the clarification. We will incorporate this into our guidelines.

@jmacd jmacd added this pull request to the merge queue Feb 23, 2026
Merged via the queue into open-telemetry:main with commit 10c66b6 Feb 23, 2026
62 checks passed
cijothomas added a commit to cijothomas/otel-arrow that referenced this pull request Feb 23, 2026
The "Severity and placement" section previously suggested using the same
event name (node.shutdown) at different severity levels to distinguish a
graceful shutdown (INFO) from a critical failure (ERROR).

This conflicts with the guidance in Event Naming and Verbs, which
recommends distinct event names for different outcomes. Updated the
example to use node.shutdown.complete (INFO) and node.shutdown.fail
(ERROR) so the guide is internally consistent.

This is an intentional semantic change, not just a wording tweak. The
guide now consistently says: different outcomes → different event names.
Severity reflects significance, but should not be the sole way to
distinguish success from failure.

Fixes open-telemetry#1972
@cijothomas cijothomas deleted the cijothomas/tuneguidance branch February 24, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Feature: Durable logging identifiers for internal log statements

6 participants