[OTEP] Federate Semantic Conventions#4906
[OTEP] Federate Semantic Conventions#4906jsuereth wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
| ``` | ||
| #### Registry Requirements | ||
|
|
||
| - The registry MUST declare a dependency on core semantic conventions. |
There was a problem hiding this comment.
so that there is no chance of having conflicts / vendoring in in incompatible manner / etc?
I like it!
|
|
||
| - The registry MUST declare a dependency on core semantic conventions. | ||
| - The registry MUST use a dependabot or rennovate bot to keep dependencies up-to-date. | ||
| - The registry MUST enforce semantic convention policies via github workflow, e.g. |
There was a problem hiding this comment.
nit: maybe explain what policies are (link to some section/doc), I imagine it's not a commonly known concept
| - name: verify template packages | ||
| run: weaver registry check \ | ||
| -r {my_registry_dir} \ | ||
| -p https://github.com/open-telemetry/opentelemetry-weaver-packages.git[policies/check/naming_conventions] \ |
There was a problem hiding this comment.
nit: it's the same on as the last one? (policies/check/naming)
| #### Independent Versioning | ||
| - It releases `v2.0.0` of the `jvm` federated registry. | ||
| - This release is **completely independent** of the core `semconv` registry (which might still be at `v1.45.0`) and other registries like `http` or `messaging`. | ||
| - Users who want the new JVM metrics can opt-in by updating their instrumentation to point to the new `schema_url`. |
There was a problem hiding this comment.
nit:
I believe we now have the opt-in mechanism that's slightly different than picking a schema url, it was just introduced in declarative config: https://github.com/open-telemetry/opentelemetry-configuration/blob/4351ebd2805746d047a23588f8d3abe89f40f79f/snippets/ExperimentalInstrumentation_kitchen_sink.yaml#L52
E.g.
instrumentation:
general:
rpc:
semconv:
version: 1
experimental: false
dual_emit: true| - This release is **completely independent** of the core `semconv` registry (which might still be at `v1.45.0`) and other registries like `http` or `messaging`. | ||
| - Users who want the new JVM metrics can opt-in by updating their instrumentation to point to the new `schema_url`. | ||
| - Existing users of `v1.x.x` are unaffected and continue to see the old OTLP output. | ||
| - **Policy Enforcement**: The federated registry uses a `weaver.yaml` configuration to enforce official OpenTelemetry policies (e.g., naming conventions, stability rules) even while iterating independently. |
There was a problem hiding this comment.
nit: it's not part of weaver.yaml now, right? do we want to make it?
Or should we ship GH actions and/or have some shareable workflows that repos would reuse or copy-paste?
|
|
||
| - **Pinning**: An instrumentation library MUST specify the `schema_url` of the federated registry version it targets in its `Scope` metadata, via `get {Meter|Tracer|Logger}` operations. | ||
| - **Breaking Changes**: If an instrumentation library adopts a new major version of a federated registry that results in breaking changes to its OTLP output, the library ITSELF must perform a major version bump. For example, if `opentelemetry-java-instrumentation` moves from `jvm/v1` to `jvm/v2`, it must release a new major version of its instrumentation package. | ||
| - **Stable by Default**: Following OTEP 4813, instrumentation can be marked as stable once its code and OTLP output are production-ready, this means marking any federated registry as stable, in tandem with the library. |
There was a problem hiding this comment.
Do we want to allow stable (federated) registry to depend on unstable parts of otel conventions?
I think we can go either way.
If we don't allow it: they can take dependency on stable parts only and vendor in unstable parts if necessary.
If we allow it, they should be able to communicate breaking changes in unstable core via semver.
Either way, we need some forcing factors for them to
- update to newer underlying core - it will be hard regardless
- bring common concepts to core conventions
I'd rather not allow it at least initially and encourage to vendor in experimental stuff.
|
|
||
| To solve the "cohesive whole" problem and provide obvious version conformance, OpenTelemetry will periodically publish **Platform Releases**. | ||
|
|
||
| A **Platform Release** is a manifest (using the schema format from OTEP 4815) that acts as a "BOM" (Bill of Materials). It does not contain new conventions itself but rather lists specific, tested-together versions of federated registries. |
There was a problem hiding this comment.
this would probably be the forcing factor to update federated registries I mentined in prev comment.
If library A depends on semconv core v1.40 and library B on semconv core v1.140 they probably can't coherently work in the same distro.
| registries: | ||
| - schema_url: https://opentelemetry.io/schemas/semconv/1.42.0 | ||
| - schema_url: https://opentelemetry.io/schemas/jvm/2.1.0 | ||
| - schema_url: https://opentelemetry.io/schemas/http/1.15.0 |
There was a problem hiding this comment.
nit: would we separate HTTP from core conventions? Probably not until it needs v2
|
|
||
| A **Platform Release** is a manifest (using the schema format from OTEP 4815) that acts as a "BOM" (Bill of Materials). It does not contain new conventions itself but rather lists specific, tested-together versions of federated registries. | ||
|
|
||
| **Example Platform Release Manifest (`OpenTelemetry 2026.1`):** |
There was a problem hiding this comment.
This is awesome!
I also like the year as a major version - we should create expectation of some breaking changes on a predictable cadence
There was a problem hiding this comment.
This is date ver no? This is not semver.
| The Semantic Conventions SIG maintains the root `opentelemetry.io/schemas/` namespace. Any new "official" federated registry (e.g., `/jvm`, `/http`) must be an approved OpenTelemetry project and will use tooling provided for federated semantic convention SIGs. | ||
|
|
||
| For third-party or experimental registries, authors are encouraged to use their own domains (e.g., `acme.com/schemas/`) to avoid collisions. The `weaver` tool will also validate that a registry does not redefine attributes or signals already present in its dependencies, so any opentelemetry registry MUST depend on the core `semantic-conventions` registry. | ||
|
|
There was a problem hiding this comment.
we should maintain a list of federated conventions in semconv (or somewhere) for discoverability. And using it we can even have a weekly check that validates all federated registries together to find conflicts or tests them individually against latest core. If some collisions are found, we could automatically create issues and notify maintainers about conflicts.
It would also be a good forcing factor for conventions to stay in sync with core when they can.
|
|
||
| 1. **Incubating**: Registry exists outside the core, managed by a specific SIG. It uses a unique namespace, both for schema_url (e.g., `opentelemetry.io/schemas/jvm`) and for signals/attributes (e.g. `jvm`). | ||
| 2. **Maturity Progression**: The federated registry progresses through `development` -> `beta` -> `stable` according to its own usage and feedback. | ||
| 3. **Criteria for Promotion**: To be merged into core `semconv`, a federated registry MUST: |
There was a problem hiding this comment.
[update] I see you have it addressed later
assuming it's not required, what would be the motivation to even start bringing semconv to core?
I can imagine these:
- offload maintenance of a stable thing to core repo
- need to align between different federated repos (e.g. align db server with db client)
- need to align compatibility for platform release
But unless something starts to break, I don't see people being too interested in this work. And I think it's usually fine, but it's likely that we'll grow into 5 different versions of some conventions across 5 different repos.
We should probably require sigs like GenAI to be language-agnostic to avoid it.
There was a problem hiding this comment.
Absolutely agree. I think that's the inevitability of federating and we'll need to balance cost / benefit here. I expect we'll have two types of fedaration:
jvm/goruntime like things where it's really a specific technology that needs to be addressed. Unlikely we'll need to merge this back in ever.GenAi,db,httptype things where they need to be cross language and cross cutting. These main want to come back into core, so their attributes can be re-used in further federated registries without as much dependnecy hell, but do not need to.
|
|
||
| As OpenTelemetry's semantic conventions expand, a monolithic registry and versioning scheme create friction: | ||
| 1. **Slow Evolution**: Highly specialized or domain-specific conventions (e.g., JVM metrics, cloud-provider-specific resources) are often gated by the slower stabilization process of the core registry. | ||
| 2. **Coupled Breaking Changes**: A major version bump in one sub-domain (e.g., a total overhaul of database conventions) should not force the entire OpenTelemetry ecosystem to adopt a major version bump. |
There was a problem hiding this comment.
How does this propagate to natively instrumented libraries and what incentive do they have to move to newer conventions? There always is a transform layer running?
| As OpenTelemetry's semantic conventions expand, a monolithic registry and versioning scheme create friction: | ||
| 1. **Slow Evolution**: Highly specialized or domain-specific conventions (e.g., JVM metrics, cloud-provider-specific resources) are often gated by the slower stabilization process of the core registry. | ||
| 2. **Coupled Breaking Changes**: A major version bump in one sub-domain (e.g., a total overhaul of database conventions) should not force the entire OpenTelemetry ecosystem to adopt a major version bump. | ||
| 3. **Instrumentation Stability**: Instrumentation libraries need a clear way to declare stability for their OTLP output by pinning to specific versions of the conventions they implement, regardless of whether those conventions are "core" or "federated". |
There was a problem hiding this comment.
are conventions in a particular version all uniformly considered "stable"?
|
|
||
| ## Goals | ||
|
|
||
| 1. **Independent Lifecycle**: Enable domain-specific semantic convention registries to have their own SemVer lifecycle. |
There was a problem hiding this comment.
what are the rules of semver here. Like dropping an attribute is "breaking" - what if it was optional?
|
|
||
| 1. **Independent Lifecycle**: Enable domain-specific semantic convention registries to have their own SemVer lifecycle. | ||
| 2. **Instrumentation Pinning**: Allow instrumentation libraries to declare stability by pinning to specific federated registry versions. | ||
| 3. **Platform Releases**: Provide a mechanism (Platform Releases) to bundle specific versions of federated registries into a "tested-together" cohesive set. |
There was a problem hiding this comment.
What's the acceptance criteria for "tested together", what if an instrumentation fails, is it removed from release?
| 1. **Independent Lifecycle**: Enable domain-specific semantic convention registries to have their own SemVer lifecycle. | ||
| 2. **Instrumentation Pinning**: Allow instrumentation libraries to declare stability by pinning to specific federated registry versions. | ||
| 3. **Platform Releases**: Provide a mechanism (Platform Releases) to bundle specific versions of federated registries into a "tested-together" cohesive set. | ||
| 4. **Promotion Path**: Define a clear path for federated conventions to be consolidated into the core OpenTelemetry registry. |
There was a problem hiding this comment.
What if end-users want to opt out of expensive / PII risking conventions. Do these still land in core? What is the final say for something deserving to be in core. I'm unfamiliar here.
There was a problem hiding this comment.
We allow expensive / PII in core, with opt_in today and there's guidance around it. I think this is an orthogonal concern to this proposal, but an important issue that needs a solution across various components of OpenTelemetry.
For this proposal, we continue with the current behavior we have.
| **The JVM Metrics Example**: | ||
| The JVM Metrics registry (e.g., `opentelemetry.io/schemas/jvm`) identifies a need to overhaul its metric names to align with a new runtime standard. |
There was a problem hiding this comment.
this is a good example, but also centralized with the maintainers of Java. So it seems like a simpler example? It's harder to do this when there is no central vendor.
| The JVM Metrics registry (e.g., `opentelemetry.io/schemas/jvm`) identifies a need to overhaul its metric names to align with a new runtime standard. | ||
|
|
||
| #### Registry Structure | ||
| A federated registry like `jvm-metrics` would contain a manifest, the convention definitions, and a policy enforcment github action. |
There was a problem hiding this comment.
where to these actions run - on OTEL repos only? How does it run on native libraries.
There was a problem hiding this comment.
So we only "enforce" this structure for OTEL-owned repositories/conventions.
For native libraries, we provide open-telemetry/weaver as a tool they can use to participate in this federation, but we do not enforce any structure / policy unless they are planning to move their definitions back into the OpenTelemetry project.
| #### Registry Requirements | ||
|
|
||
| - The registry MUST declare a dependency on core semantic conventions. | ||
| - A stable federated registry MUST NOT depend on unstable or experimental core conventions. |
There was a problem hiding this comment.
what if the federated conventions need the unstable conventions for it to make sense - example being session / threading of agents.
There was a problem hiding this comment.
Then the federated convention would be marked "unstable" as well. Basically you can't declare a "child" convention stable unless the "parent" is also stable.
This doesn't preclude usage, just stability declaration.
| #### Independent Versioning | ||
| - It releases `v2.0.0` of the `jvm` federated registry. | ||
| - This release is **completely independent** of the core `semconv` registry (which might still be at `v1.45.0`) and other registries like `http` or `messaging`. | ||
| - Users who want the new JVM metrics can opt-in by utilizing declarative configuration mechanisms (e.g., `version: 2`, `dual_emit: true`). |
| - This release is **completely independent** of the core `semconv` registry (which might still be at `v1.45.0`) and other registries like `http` or `messaging`. | ||
| - Users who want the new JVM metrics can opt-in by utilizing declarative configuration mechanisms (e.g., `version: 2`, `dual_emit: true`). | ||
| This should implicitly update the `schema_url` of telemetry, allowing users to see which version is which and address versioning issues. | ||
| - Existing users of `v1.x.x` are unaffected and continue to see the old OTLP output. |
There was a problem hiding this comment.
which users are we talking about. The end-user?
| Instrumentation libraries "own" the stability of the OTLP they produce. To maintain this stability: | ||
|
|
||
| - **Pinning**: An instrumentation library MUST specify the `schema_url` of the federated registry version it targets in its `Scope` metadata, via `get {Meter|Tracer|Logger}` operations. | ||
| - **Breaking Changes**: If an instrumentation library adopts a new major version of a federated registry that results in breaking changes to its OTLP output, the library ITSELF must perform a major version bump. For example, if `opentelemetry-java-instrumentation` moves from `jvm/v1` to `jvm/v2`, it must release a new major version of its instrumentation package. |
There was a problem hiding this comment.
this is not so simple. Instrumentation is dependent on now two versions, the library it instruments AND the semcov that it tracks. What if the instrumentation needs to move a major version for library changes, then subsequently needs to backfill support for an older version. Now there is no tenable way to track both semcov breaking changes with library breaking changes.
There was a problem hiding this comment.
I think I intend this to mean the opposite, or at least I don't see them as coupled.
- If you make a breaking change in semconv, you need a major version bump on your library.
- You may need major version bump on your library for other reasons, you do NOT need to bump semconv major version for this.
The key here is folks understand they need to do the first, and that the output signals of an instrumentation library are part of its stability.
Regarding tracking both semconv breaking + library breaking changes - I wanted to keep this simpler where you mostly just pay attention to library major version bumps, and then schema_url and backends can help you handle incompatibilities / conversions downstream.
|
|
||
| - **Pinning**: An instrumentation library MUST specify the `schema_url` of the federated registry version it targets in its `Scope` metadata, via `get {Meter|Tracer|Logger}` operations. | ||
| - **Breaking Changes**: If an instrumentation library adopts a new major version of a federated registry that results in breaking changes to its OTLP output, the library ITSELF must perform a major version bump. For example, if `opentelemetry-java-instrumentation` moves from `jvm/v1` to `jvm/v2`, it must release a new major version of its instrumentation package. | ||
| - **Stable by Default**: Following OTEP 4813, instrumentation can be marked as stable once its code and OTLP output are production-ready, this means marking any federated registry as stable, in tandem with the library. |
There was a problem hiding this comment.
We would allow semconv to be marked as stable prior to libraries being fully stable. This is more to prevent the opposite in otel - libraries which are production-ready NOT declaring stability, because of a looming threat of semconv changes.
| No. Engaging with `schema_url` and federated registries is optional. OpenTelemetry continues to work as-is for users who do not require automated schema transformation or validation. Existing observability ecosystems suffer from these same semantic fragmentation issues today, but they are typically only discovered at the storage or query layer. This proposal provides the metadata necessary to *address* these issues upstream, but it does not mandate that every SDK or Collector component must be schema-aware. | ||
|
|
||
| ### 8. Won't this lead to "version fatigue" if instrumentation libraries have to major bump frequently to adopt federated registry changes? | ||
| No. Instrumentation libraries in OpenTelemetry are subject to the [specification's versioning and stability policies](/specification/versioning-and-stability.md). These policies strongly discourage frequent major version bumps and require minimum support periods (e.g., one year for contrib/instrumentation packages) for older major versions. This inherent bias towards stability ensures that instrumentation libraries do not adopt breaking changes from federated registries at a high cadence. Instead, maintainers will typically batch such changes or only adopt them when the value to users clearly outweighs the significant cost of a major release and the subsequent long-term support requirement. |
There was a problem hiding this comment.
the problem here is "cost" - the cost to one OTLP backend might be significant - it dictates a business deal being made in a quarter vs losing to a competitor that doesn't comply to Otel at all. This puts vendors that are all in on OTEL at risk as their upstream signals are being dictated by parties that have no actual financial overhead or incentives to move the standards forward.
|
|
||
| This is a scenario that exists today in open source observability. As | ||
| `schema_url` becomes more widely adopted, we expect backends to support | ||
| automatic or semi-automatic (e.g agent-aided) transitions and translations to handle this problem. |
There was a problem hiding this comment.
how can we say that backends will use these agent-aided transformations? That seems like something that certain vendors such as companies that have privileged foundation model access a competitive advantage. This seems unfair.
There was a problem hiding this comment.
Good point, and not the real intention here.
The agent-aided meant we may want to start leveraging agents within open-telemetry to define transitions at definition time, and publicizing these transitions for everyoen to consume without agents. We cannot have agent-in-the-loop on hot paths, that's intractable. Leveraging an agent to help improve the coverage of how many version transitions we can safely provide is what this was intended to mean.
The true goal is that we have an open model and easy-to-use transition "definition" that folks can consume to translate between versions of schema, or even (if we're successful) between different schema-urls entirely.
That's still experimental / long-term exploration for the project. For now, we have targeted version-bump transitions as something we believe we can provide automatic transition model that should NOT be costly unfair for the ecosystem to adopt.
| ### Automatic Schema Transformation | ||
|
|
||
| Tooling (like `weaver`) could automatically generate the necessary OTLP transformations when a user moves from a Platform Release `2026.1` to `2026.2`. We could also | ||
| leverage `weaver`'s MCP server to automatically generate OTLP transformation and configuration today as tooling to help with major version bumps needed in OpenTelemetry |
There was a problem hiding this comment.
I'm not familiar with the weaver MCP server. How does this work?
There was a problem hiding this comment.
Today it's pretty light. You run weaver mcp and it will load in a registry and allow the agent to interact with a version of semantics and lookup the model, definitions, notes, etc. It can also run live-check to enforce conformance on the version. We hope to expand this further to be more of an aide when defining conventions and allow more ad-hoc agent-initiated flows, e.g. "Help me create a semantic convention registry from this integration test"
DO NOT MERGE
This is a draft proposal to federate the semantic-conventions repository to allow for faster innovation and evolution across the ecosystem.