Skip to content

Commit 17646a7

Browse files
mx-psidmathieujade-guiton-dd
authored
[chore][RFC] Semantic conventions migrations in the Collector (#14273)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description RFC about dealing with semantic conventions migrations. This would be applicable to semantic conventions migrations related to RPC, system metrics, Kubernetes metrics and attributes... #### RFC Checklist - [x] Announced on the Jan 14, 2026 Collector SIG meeting - [x] Mentioned in #otel-collector-dev https://cloud-native.slack.com/archives/C07CCCMRXBK/p1768400424432639 --------- Co-authored-by: Damien Mathieu <42@dmathieu.com> Co-authored-by: Jade Guiton <jade.guiton@datadoghq.com>
1 parent 0632615 commit 17646a7

File tree

2 files changed

+239
-0
lines changed

2 files changed

+239
-0
lines changed

.github/workflows/utils/cspell.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
"Distro",
2323
"Dmitrii",
2424
"Dockerhub",
25+
"Dont",
2526
"Drutu",
2627
"Dynatrace",
2728
"Excalidraw",

docs/rfcs/semconv-feature-gates.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Semantic conventions migrations in the Collector
2+
3+
## Overview
4+
5+
The OpenTelemetry Collector components emit telemetry that often conforms to semantic conventions.
6+
Semantic conventions have [varying levels of stability][1] and often have an SDK-focused migration
7+
guide.
8+
9+
This RFC defines how migration should be handled in Collector components that have
10+
semantic conventions that migrate to a stable version, in a Collector-native way.
11+
12+
## Scope and goals
13+
14+
This RFC provides general guidelines for semantic convention-mandated migrations of telemetry created by Collector components (usually receivers) and output into the Collector's pipeline. It explicitly does not attempt to cover:
15+
- telemetry created by an application and forwarded by a Collector receiver;
16+
- internal telemetry of Collector components;
17+
- guidelines for the migration of specific semantic conventions.
18+
19+
The migration mechanism should have the following characteristics:
20+
21+
1. **Collector native**: the mechanism should work in a similar way to other Collector migrations
22+
and should feel natural and intuitive to users.
23+
2. **Simple**: a user should have to make a small number of changes to their Collector deployment to
24+
migrate to a new set of conventions.
25+
3. **Easy to understand**: It should be easy to understand how to migrate a particular set of
26+
conventions.
27+
5. **Flexible (double publish)**: The mechanism should allow you to 'double publish' v0 and v1
28+
conventions
29+
6. **Flexible (other conventions)**: The mechanism should still allow for evolution of other
30+
semantic conventions that are not being migrated.
31+
32+
## Background
33+
34+
### Setup
35+
36+
We want to write guidance for when we have a component that emits telemetry from a common
37+
`area` that is undergoing a migration mandated by the Semantic Conventions SIG. In the rest of this
38+
document we refer to the **v0** conventions and the **v1** conventions, which are the conventions
39+
in this area before and after the migration.
40+
41+
When the semantic conventions are specific to a component we use
42+
- `kind` to refer to the component kind (receiver, exporter...)
43+
- `id` for the component id (e.g. `hostmetrics`)
44+
45+
### What does the semconv spec say?
46+
47+
The semantic conventions specification defines an environment variable named
48+
`OTEL_SEMCONV_STABILITY_OPT_IN` that, for each area, takes two possible values:
49+
1. One value representing the new semantic conventions (e.g. `http`, `gen_ai_latest_experimental`)
50+
2. Once mature enough, a second value ending in `/dup` that emits both the old conventions and the
51+
new ones.
52+
53+
This is not specified in a generic way, but it is a consistent pattern across all semantic
54+
conventions areas that are being actively worked on:
55+
56+
<details>
57+
58+
<summary> Example 1: HTTP compatibility warning </summary>
59+
60+
Taken from [semconv v1.38.0][2]:
61+
62+
> **Warning**
63+
> Existing HTTP instrumentations that are using
64+
> [v1.20.0 of this document](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.20.0/specification/trace/semantic_conventions/http.md)
65+
> (or prior):
66+
>
67+
> * SHOULD NOT change the version of the HTTP or networking conventions that they emit
68+
> until the HTTP semantic conventions are marked stable (HTTP stabilization will
69+
> include stabilization of a core set of networking conventions which are also used
70+
> in HTTP instrumentations). Conventions include, but are not limited to, attributes,
71+
> metric and span names, and unit of measure.
72+
> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN`
73+
> in the existing major version which is a comma-separated list of values.
74+
> The only values defined so far are:
75+
> * `http` - emit the new, stable HTTP and networking conventions,
76+
> and stop emitting the old experimental HTTP and networking conventions
77+
> that the instrumentation emitted previously.
78+
> * `http/dup` - emit both the old and the stable HTTP and networking conventions,
79+
> allowing for a seamless transition.
80+
> * The default behavior (in the absence of one of these values) is to continue
81+
> emitting whatever version of the old experimental HTTP and networking conventions
82+
> the instrumentation was emitting previously.
83+
> * Note: `http/dup` has higher precedence than `http` in case both values are present
84+
> * SHOULD maintain (security patching at a minimum) the existing major version
85+
> for at least six months after it starts emitting both sets of conventions.
86+
> * SHOULD drop the environment variable in the next major version (stable
87+
> next major version SHOULD NOT be released prior to October 1, 2023).
88+
89+
</details>
90+
91+
<details>
92+
93+
<summary> Example 2: GenAI compatibility warning </summary>
94+
95+
From [semconv v1.38.0][3]:
96+
97+
> [!Warning]
98+
>
99+
> Existing GenAI instrumentations that are using
100+
> [v1.36.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/gen-ai/README.md)
101+
> (or prior):
102+
>
103+
> * SHOULD NOT change the version of the GenAI conventions that they emit by default.
104+
> Conventions include, but are not limited to, attributes, metric, span and event names,
105+
> span kind and unit of measure.
106+
> * SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN`
107+
> as a comma-separated list of category-specific values. The list of values
108+
> includes:
109+
> * `gen_ai_latest_experimental` - emit the latest experimental version of
110+
> GenAI conventions (supported by the instrumentation) and do not emit the
111+
> old one (v1.36.0 or prior).
112+
> * The default behavior is to continue emitting whatever version of the GenAI
113+
> conventions the instrumentation was emitting (1.36.0 or prior).
114+
>
115+
> This transition plan will be updated to include stable version before the
116+
> GenAI conventions are marked as stable.
117+
118+
</details>
119+
120+
<details>
121+
122+
<summary> Example 3: K8s compatibility warning </summary>
123+
124+
> From [semconv v1.38.0][3]:
125+
126+
> When existing K8s instrumentations published by OpenTelemetry are
127+
> updated to the stable K8s semantic conventions, they:
128+
>
129+
> - SHOULD introduce an environment variable `OTEL_SEMCONV_STABILITY_OPT_IN` in
130+
> their existing major version, which accepts:
131+
> - `k8s` - emit the stable k8s conventions, and stop emitting
132+
> the old k8s conventions that the instrumentation emitted previously.
133+
> - `k8s/dup` - emit both the old and the stable k8s conventions,
134+
> allowing for a phased rollout of the stable semantic conventions.
135+
> - The default behavior (in the absence of one of these values) is to continue
136+
> emitting whatever version of the old k8s conventions the
137+
> instrumentation was emitting previously.
138+
> - Need to maintain (security patching at a minimum) their existing major version
139+
> for at least six months after it starts emitting both sets of conventions.
140+
> - May drop the environment variable in their next major version and emit only
141+
> the stable k8s conventions.
142+
143+
> Specifically for the Opentelemetry Collector:
144+
145+
> The transition will happen through two different feature gates.
146+
> One for enabling the new schema called `semconv.k8s.enableStable`,
147+
> and one for disabling the old schema called `semconv.k8s.disableLegacy`. Then:
148+
149+
> - On alpha the old schema is enabled by default (`semconv.k8s.disableLegacy` defaults to false),
150+
> while the new schema is disabled by default (`semconv.k8s.enableStable` defaults to false).
151+
> - On beta/stable the old schema is disabled by default (`semconv.k8s.disableLegacy` defaults to true),
152+
> while the new is enabled by default (`semconv.k8s.enableStable` defaults to true).
153+
> - It is an error to disable both schemas
154+
> - Both schemas can be enabled with `--feature-gates=-semconv.k8s.disableLegacy,+semconv.k8s.enableStable`.
155+
156+
</details>
157+
158+
## Proposed mechanism
159+
160+
Suppose the `<id>` (e.g. `hostmetrics`) `kind` (e.g. `receiver`) component is migrating from v0 to
161+
v1 semantic conventions on the area `area` (e.g. `process`). The semantic conventions specification
162+
defines the set of conventions that are in scope for a particular migration.
163+
164+
To support this migration, the component defines two feature gates: `<kind>.<id>.EmitV1<Area>Conventions` (e.g.
165+
`receiver.hostmetrics.EmitV1ProcessConventions`) and `<kind>.<id>.DontEmitV0<Area>Conventions`
166+
(e.g. `receiver.hostmetrics.DontEmitV0ProcessConventions`). These feature gates work as follows:
167+
168+
| `<kind>.<id>.EmitV1<Area>Conventions` status | `<kind>.<id>.DontEmitV0<Area>Conventions` status | Resulting behavior |
169+
|-----------------------------------------------|-------------------------------------------------------|-----------------------------------------------------------|
170+
| Disabled | Disabled | Emit telemetry under the 'v0' conventions |
171+
| Disabled | Enabled | Error at startup since this would not emit any telemetry |
172+
| Enabled | Disabled | Emit telemetry under both the v0 and the v1 conventions |
173+
| Enabled | Enabled | Emit telemetry under the v1 conventions |
174+
175+
Both feature gates evolve at the same pace through the feature gate stages, so that the progression
176+
is as follows:
177+
1. Initially both are at **alpha** stage (disabled by default). This means that the default behavior
178+
is to emit only the 'v0' conventions. Users can opt-in to emit the v1 conventions alongside the
179+
v0 conventions or to emit only the v1 conventions. A warning message must be logged by the component at startup indicating the upcoming change.
180+
2. Whenever there is a semantic conventions release that marks these as stable, the feature gates are promoted to the
181+
**beta** stage on the same Collector release. The new default behavior is therefore to emit only the
182+
'v1' conventions. Users can opt-out to emit the v1 conventions alongside the v0 conventions or
183+
to emit only the v0 conventions.
184+
3. After 4 minor releases, the feature gates are promoted to the **stable** stage. At this point users
185+
can only use the v1 conventions.
186+
4. After additional 4 minor releases, the feature gates are removed.
187+
188+
This mechanism does not cover any sort of transition for experimental semantic conventions. These
189+
presumably would be covered by separate feature gates or some other mechanism.
190+
191+
## Alternative mechanisms
192+
193+
There are some other possibilities:
194+
195+
### Environment variable
196+
197+
We could just use the `OTEL_SEMCONV_STABILITY_OPT_IN` mechanism. However, this does not feel
198+
"Collector native": Collector users expect experimental features to be controlled via feature gates
199+
and as such this could be a surprising mechanism. In particular, users would expect that they are
200+
able to 'roll back' to the previous behavior even after a Collector upgrade, something that the
201+
environment variable mechanism explicitly does not support.
202+
203+
### More granular feature gate pairs
204+
205+
The granularity of the feature gates described could be changed: we could have a pair per convention
206+
or even a pair for the whole Collector. I argue 'per component' strikes the right balance between
207+
simplicity and flexibility:
208+
- per convention would lead to dozens of feature gates on some of the areas we want to stabilize. It
209+
would also be unclear how these interact on edge cases (semantic conventions may only make sense
210+
holistically)
211+
- a single pair of feature gates would effectively be forever unstable and would not be flexible
212+
enough to allow people to migrate on a per dashboard basis
213+
214+
### Meta feature gate
215+
216+
We could have both a feature gate pair per component and a meta target feature gate pair that allows
217+
you to enable/disable all v1 conventions at the same time. This is effectively a superset of the
218+
proposed mechanism, so I argue we can postpone this for later: if users ask for it, we can always
219+
add it in the future.
220+
221+
## Open questions and future possibilities
222+
223+
This document does not cover how to deal with experimental semantic conventions after the 'big'
224+
migration has been completed in one particular area. What to do here in part depends on the
225+
[stabilization changes][4]. Quoting the blogpost:
226+
> Instrumentation stability should be decoupled from semantic convention stability. We have a lot of
227+
> stable instrumentation that is safe to run in production, but has data that may change in the
228+
> future. Users have told us that conflating these two levels of stability is confusing and limits
229+
> their options.
230+
231+
How to deal with these remains an open question that should be tackled in OTEPs first.
232+
233+
As mentioned above, the 'Meta feature gate' remains a possibility even when adopting this mechanism.
234+
235+
[1]: https://opentelemetry.io/docs/specs/semconv/general/semantic-convention-groups/#group-stability
236+
[2]: https://github.com/open-telemetry/semantic-conventions/blob/v1.38.0/docs/http/README.md
237+
[3]: https://github.com/open-telemetry/semantic-conventions/blob/v1.38.0/docs/gen-ai/README.md
238+
[4]: https://opentelemetry.io/blog/2025/stability-proposal-announcement/

0 commit comments

Comments
 (0)