Clarify event naming for different outcomes by cijothomas · Pull Request #2010 · open-telemetry/otel-arrow

cijothomas · 2026-02-10T00:45:02Z

The "Severity and placement" section previously suggested using the same event name (node.shutdown) at different severity levels to distinguish a graceful shutdown (INFO) from a critical failure (ERROR).

This conflicts with the guidance in Event Naming and Verbs, which recommends distinct event names for different outcomes. Updated the example to use node.shutdown.complete (INFO) and node.shutdown.fail (ERROR) so the guide is internally consistent.

This is an intentional semantic change, not just a wording tweak. The guide now consistently says: different outcomes → different event names. Severity reflects significance, but should not be the sole way to distinguish success from failure.

Fixes #1972

codecov · 2026-02-10T00:47:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.42%. Comparing base (70c62ad) to head (21f51b7).
⚠️ Report is 42 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2010      +/-   ##
==========================================
- Coverage   86.42%   86.42%   -0.01%     
==========================================
  Files         525      525              
  Lines      167144   167144              
==========================================
- Hits       144457   144450       -7     
- Misses      22153    22160       +7     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`88.43% <ø> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.30% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`
quiver	`92.15% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cijothomas · 2026-02-10T17:48:06Z

@lquerel @albertlockett Requesting more eyes for this.

AaronRM · 2026-02-10T18:56:36Z

+Regarding severity, choose the log level that best reflects the significance of
+the event. For example, `node.shutdown.complete` at INFO for a graceful
+shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are
+distinct events, not the same event at different severity levels.


FWIW, I have a slight preference for the previous guidance (i.e. same event name, different level based on outcome) only because it can be easier to review in a downstream dashboard, etc...

When filtering by event name (e.g. node.shutdown) the user sees both the info and warning/error messages interleaved without needing to get into the internals of the event names.

Of course, the counter argument is that a fuzzy search on node.shutdown would yield the same results either way.

https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-eventname
I will ask if OTel semantic conventions has thoughts on this as well. From what I can remember from old discussions, each occurrence of an Event should have same structure - but it does not explicitly say if Severity is part of the structure! (it does say attributes and body are)

albertlockett

I'll readily admit that I do not know what the best practices here!

The OTel data model says the event name identifies the "class / type of event", and I suppose it's conceivable that events of the same type/class could be more/less severe depending on the context in which they occurred.

Also, does this example not still conflict with the guidance in event naming?

otel-arrow/rust/otap-dataflow/docs/telemetry/events-guide.md

Lines 123 to 148 in 1f22e63

    
           ## Event naming 
        
           Event names MUST be low-cardinality and stable. Follow the semantic conventions 
        
           guide for naming: 
        
           - Lowercase and dot-separated. It identifies a class of event, not an instance. 
        
           - Keep the name stable and "type-like". Treat it as a schema identifier. 
        
           - Use verbs for actions (e.g. `pipeline.config.reload`). 
        
           - Avoid embedding IDs or dynamic values in the name. Encode variability as 
        
             attributes. 
        
           - Avoid synonyms that fragment cardinality across names (`finish` vs `complete`, 
        
             `error` vs `fail`). Pick one verb set and stick to it. 
        
           More precisely, in this project, event names SHOULD follow this pattern: 
        
           `otelcol.<entity>[.<thing>].<verb>` 
        
           Where: 
        
           - `otelcol.` is the project prefix/namespace used for events and other custom 
        
             telemetry. 
        
           - `<entity>` is the primary entity involved (e.g. `pipeline`, `node`, 
        
             `channel`). See the [entity model](entity-model.md) for the list of entities. 
        
           - `<thing>` is an optional sub-entity, subject, or stage (e.g. `build`, `run`, 
        
             `receiver`, `exporter`). 
        
           - `<verb>` is the action or occurrence (e.g. `start`, `complete`, `fail`, 
        
             `reload`, `shutdown`).

node.shutdown.complete is of the form <entity>.<verb>.<verb> isn't it? Or is shutdown now the "thing"?

jmacd · 2026-02-10T23:16:40Z

I'm so, so very tempted to share my opinion, but then I realized it's wasteful of all of our time. I agree with Cijo we should follow OTel and there is a nuanced question about whether Level is considered independent of event identity to be addressed. I approve if @AaronRM and @cijothomas and others agree.

AaronRM · 2026-02-10T23:24:43Z

I'm so, so very tempted to share my opinion, but then I realized it's wasteful of all of our time. I agree with Cijo we should follow OTel and there is a nuanced question about whether Level is considered independent of event identity to be addressed. I approve if @AaronRM and @cijothomas and others agree.

I agree. 👍 My main concern is that the guidelines are consistent to know how to address comments properly in #1988.

cijothomas · 2026-02-12T00:20:38Z

node.shutdown.complete is of the form .. isn't it? Or is shutdown now the "thing"?

I'll follow-up separately for this one.

Only goal of this PR is to clarify if we should use same EventName for different outcomes of an operation.

One event:
process.shutdown with attribute status=Sucess/Fail , severity depends on status

vs

process.shutdown.ok , INFO
process.shutdown.failed WARN/ERROR attributes reason, error_code, ....

We currently have former, and this PR is changing to the latter.

Both approaches are valid, but distinct event names better reflect that a successful shutdown and a failed shutdown are fundamentally different event types with different operational significance and mental models—success is a routine lifecycle transition while failure is an anomalous condition requiring investigation.

This could be a time consuming discussion without much returns! Lets give it another day to see if anyone has strong preference either way.

lquerel · 2026-02-12T17:22:57Z

There are indeed a few ambiguities in our guidelines around events that need clarification.

I believe the semantic conventions lean toward Approach 1, but I'm adding @lmolkova and @jsuereth (semconv and weaver maintainers) to get their perspective on this topic.

Concretely, my thinking is that the event name would be:

<ns>.node.shutdown

where node represents the thing, and the attributes would include:

<ns>.shutdown.outcome = "success" | "failure"
error.type            # when severity is ERROR or WARN
error.message.   # when severity is ERROR or WARN

I've intentionally left the <ns> placeholder undefined for now, since there is still some discussion about which namespace we should use for this project (otelcol or something else).

cijothomas · 2026-02-12T18:15:52Z

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/OpenTelemetryProtocolExporterEventSource.cs#L98-L114
In OTel .NET's internal logging, we follow Approach 2 - normal export vs failed export are distinct events. Not saying that is the best, but it has certainly influenced my proposal. Here's why:
Separately EventName would help when we eventually use weaver like tools to auto-generate code for reporting each event. With distinct event name, it'll be easier to enforce things - node.shutdown.fail event's generated code can enforce passing an error.type.

lquerel · 2026-02-12T18:20:05Z

@cijothomas

Here's why: Separately EventName would help when we eventually use weaver like tools to auto-generate code for reporting each event. With distinct event name, it'll be easier to enforce things - node.shutdown.fail event's generated code can enforce passing an error.type.

An interesting argument to take into consideration.

jmacd · 2026-02-12T20:58:47Z

I think we should merge this. I do not think we should wait for the philosophical question or OTel to change its mind.
Mostly, it's because I want #1988 to merge!

lmolkova · 2026-02-12T21:09:29Z

+the event. For example, `node.shutdown.complete` at INFO for a graceful
+shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are


this is not a formal guidance (at least not yet), but event naming follows the same principles as metric naming - one name for specific kind of event, outcome recorded as an attribute.

E.g. node.shutdown with attribute error.type = something_specific.

Caveat: assuming there are events along the way, during the process of shutdown, then there could be
node.shutdown.error event(s) and the process of shutdown can also be recorded as a span (or an event if spans don't work).
TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

Severity considerations (in review): open-telemetry/semantic-conventions#3311
TL;DR: if it's the end of the world situation - FATAL, if it clearly affects user experience - ERROR, if it's retriable and effect on user experience is minimal/not known - WARN or below

Thanks for sharing this @lmolkova. When looking at the OTel Collector's internal metrics, I see they use separate metric names for different outcomes:
otelcol_receiver_accepted_spans and otelcol_receiver_refused_spans
Rather than:
otelcol_receiver_spans{status="accepted|refused"}

Of course it is Collector, and while it was being built, semantic conventions may not have existed. Anyway, I can open an issue with sem.conv repo to continue this discussion in a better forum for this.

TL;DR: node.shutdown event represents the shutdown process with any outcome (success, failure, something in betwwen). node.shutdown.error represents a specific occurrence of an error during shutdown process. Naming comes from what you want to record.

This is quite reasonable.

I believe there is an issue on collector - open-telemetry/opentelemetry-collector#14350. Having arrow interested in a similar set of metrics is a good justification to move the issue to semconv (or create a new one).

We've gone through a round of internal OTel SDK metric renames recently - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/otel/sdk-metrics.md and I imagine collector would do something similar at some point

SDK and Collectors/pipelines can use the same metrics, but I've begun to see the SDK case as special compared with the pipeline case.

@lmolkova please see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md which I have been working to emulate.

Thank you, @lmolkova, for the clarification. We will incorporate this into our guidelines.

The "Severity and placement" section previously suggested using the same event name (node.shutdown) at different severity levels to distinguish a graceful shutdown (INFO) from a critical failure (ERROR). This conflicts with the guidance in Event Naming and Verbs, which recommends distinct event names for different outcomes. Updated the example to use node.shutdown.complete (INFO) and node.shutdown.fail (ERROR) so the guide is internally consistent. This is an intentional semantic change, not just a wording tweak. The guide now consistently says: different outcomes → different event names. Severity reflects significance, but should not be the sole way to distinguish success from failure. Fixes open-telemetry#1972

Tweak EventName guidance

a8dd3d4

github-project-automation Bot added this to OTel-Arrow Feb 10, 2026

github-actions Bot added the rust Pull requests that update Rust code label Feb 10, 2026

AaronRM mentioned this pull request Feb 10, 2026

feat: durable event names to quiver logging #1988

Merged

jmacd approved these changes Feb 10, 2026

View reviewed changes

cijothomas added 2 commits February 10, 2026 09:38

sanity

afd6199

revert

170c640

cijothomas changed the title ~~Tweak EventName guidance~~ Clarify event naming for different outcomes Feb 10, 2026

cijothomas marked this pull request as ready for review February 10, 2026 17:43

cijothomas requested a review from a team as a code owner February 10, 2026 17:43

Merge branch 'main' into cijothomas/tuneguidance

79b0e7a

AaronRM reviewed Feb 10, 2026

View reviewed changes

albertlockett reviewed Feb 10, 2026

View reviewed changes

Merge branch 'main' into cijothomas/tuneguidance

a4e6057

Merge branch 'main' into cijothomas/tuneguidance

21f51b7

jmacd approved these changes Feb 12, 2026

View reviewed changes

lmolkova reviewed Feb 12, 2026

View reviewed changes

jmacd added this pull request to the merge queue Feb 23, 2026

Merged via the queue into open-telemetry:main with commit 10c66b6 Feb 23, 2026
62 checks passed

github-project-automation Bot moved this to Done in OTel-Arrow Feb 23, 2026

cijothomas deleted the cijothomas/tuneguidance branch February 24, 2026 19:50

	## Event naming

	Event names MUST be low-cardinality and stable. Follow the semantic conventions
	guide for naming:

	- Lowercase and dot-separated. It identifies a class of event, not an instance.
	- Keep the name stable and "type-like". Treat it as a schema identifier.
	- Use verbs for actions (e.g. `pipeline.config.reload`).
	- Avoid embedding IDs or dynamic values in the name. Encode variability as
	attributes.
	- Avoid synonyms that fragment cardinality across names (`finish` vs `complete`,
	`error` vs `fail`). Pick one verb set and stick to it.

	More precisely, in this project, event names SHOULD follow this pattern:
	`otelcol.<entity>[.<thing>].<verb>`

	Where:

	- `otelcol.` is the project prefix/namespace used for events and other custom
	telemetry.
	- `<entity>` is the primary entity involved (e.g. `pipeline`, `node`,
	`channel`). See the [entity model](entity-model.md) for the list of entities.
	- `<thing>` is an optional sub-entity, subject, or stage (e.g. `build`, `run`,
	`receiver`, `exporter`).
	- `<verb>` is the action or occurrence (e.g. `start`, `complete`, `fail`,
	`reload`, `shutdown`).

		the event. For example, `node.shutdown.complete` at INFO for a graceful
		shutdown and `node.shutdown.fail` at ERROR for a critical failure -- these are

Conversation

cijothomas commented Feb 10, 2026 • edited by jmacd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cijothomas commented Feb 10, 2026

Uh oh!

AaronRM Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cijothomas Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

albertlockett left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmacd commented Feb 10, 2026

Uh oh!

AaronRM commented Feb 10, 2026

Uh oh!

cijothomas commented Feb 12, 2026

Uh oh!

lquerel commented Feb 12, 2026

Uh oh!

cijothomas commented Feb 12, 2026

Uh oh!

lquerel commented Feb 12, 2026

Uh oh!

jmacd commented Feb 12, 2026

Uh oh!

lmolkova Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cijothomas Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

cijothomas Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

lmolkova Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

jmacd Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

lquerel Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cijothomas commented Feb 10, 2026 •

edited by jmacd

Loading

codecov Bot commented Feb 10, 2026 •

edited

Loading

AaronRM Feb 10, 2026 •

edited

Loading

albertlockett left a comment •

edited

Loading

lmolkova Feb 12, 2026 •

edited

Loading