Skip to content

Conversation

@dmitryax
Copy link
Member

@dmitryax dmitryax commented Jan 15, 2026

Replaces #13784

This RFC proposes a roadmap for introducing configuration schemas to OpenTelemetry Collector components. It establishes a schema-first approach in which Go structs, JSON schemas, and documentation are all generated from a single YAML source of truth.

This RFC is the result of discussions among contributors involved in this effort:

Related Issues / PRs:

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 15, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 61 untouched benchmarks
⏩ 20 skipped benchmarks1


Comparing dmitryax:add-schemagen-rfc (1e41a11) with main (55399d4)

Open in CodSpeed

Footnotes

  1. 20 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Replaces open-telemetry#13784

This RFC proposes a roadmap for introducing configuration schemas to OpenTelemetry Collector components. It establishes a schema-first approach in which Go structs, JSON schemas, and documentation are all generated from a single YAML source of truth.

This RFC is the result of discussions among contributors involved in this effort:
- @atoulme
- @evan-bradley
- @jkoronaAtCisco
- @mx-psi
- @pavolloffay

Related Issues / PRs:
- open-telemetry/opentelemetry-collector-contrib#42214
- open-telemetry#9769
- open-telemetry#14288
- open-telemetry/opentelemetry-collector-contrib#27003
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.83%. Comparing base (52935f0) to head (1e41a11).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14433      +/-   ##
==========================================
+ Coverage   91.81%   91.83%   +0.02%     
==========================================
  Files         677      677              
  Lines       42536    42680     +144     
==========================================
+ Hits        39056    39197     +141     
- Misses       2421     2427       +6     
+ Partials     1059     1056       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


**Why YAML schema format for the source of truth?**
- **Human-readable**: Easier for component developers to author and maintain than JSON
- **Integration with existing infrastructure**: Natural extension of `metadata.yaml` approach used by `mdatagen` given that it already uses YAML to generate metrics builder configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should config schema specification be a part of metadata.yaml or separate file?

Copy link
Member Author

@dmitryax dmitryax Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to not make strong arguments here. However I believe the metadata.yaml is the natural place. If we are afraid of making it too big, we could just use a reference to another file

- Produces JSON schemas for config validation
- Creates synchronized documentation
- Generated JSON schemas pass validation tests with real collector configurations
- Generated documentation accurately reflects all configuration options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the generated documentation differs from the generated JSON schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is user-friendly Markdown that we currently manually write in README.md for each component. The JSON schema is for machines.

allOf:
- $ref: "go.opentelemetry.io/collector/scraper/scraperhelper#/$defs/ControllerConfig"
- $ref: "#/$defs/MetricsBuilderConfig"
properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A validation example would be useful in the example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What validation example do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that @pavolloffay means something like

ping_count:
	[...]
	validation:
		greater_than: 0

or similar, to show how we can add field validations to the schemas.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are just naming nitpicks, I think we can adjust those later if needed. Overall this approach makes sense to me.

@dmitryax dmitryax enabled auto-merge January 24, 2026 00:41
@dmitryax dmitryax added this pull request to the merge queue Jan 24, 2026
Merged via the queue into open-telemetry:main with commit 0632615 Jan 24, 2026
62 checks passed
@dmitryax dmitryax deleted the add-schemagen-rfc branch January 24, 2026 01:21
github-merge-queue bot pushed a commit that referenced this pull request Jan 27, 2026
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Schemas describing shared configurations used by collector components
have been added. This is part of the work related to the implementation
of configuration schemas for all components.
See more: #14433

<!-- Issue number if applicable -->
#### Link to tracking issue
Issue: #42214
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants