Skip to content

Commit 0632615

Browse files
authored
[chore] Add an RFC for Component Configuration Management Roadmap (#14433)
Replaces #13784 This RFC proposes a roadmap for introducing configuration schemas to OpenTelemetry Collector components. It establishes a schema-first approach in which Go structs, JSON schemas, and documentation are all generated from a single YAML source of truth. This RFC is the result of discussions among contributors involved in this effort: - @atoulme - @evan-bradley - @iblancasa - @jkoronaAtCisco - @mx-psi - @pavolloffay Related Issues / PRs: - open-telemetry/opentelemetry-collector-contrib#42214 - #9769 - #14288 - open-telemetry/opentelemetry-collector-contrib#27003
1 parent 55399d4 commit 0632615

File tree

2 files changed

+136
-0
lines changed

2 files changed

+136
-0
lines changed

.github/workflows/utils/cspell.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@
148148
"configparser",
149149
"configretry",
150150
"configrpc",
151+
"configschema",
151152
"configsource",
152153
"configtelemetry",
153154
"configtest",
@@ -425,6 +426,7 @@
425426
"sarama",
426427
"sattributes",
427428
"sattributesprocessor",
429+
"schemagen",
428430
"scrapererror",
429431
"scraperhelper",
430432
"scrapertest",
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Component Configuration Management Roadmap
2+
3+
## Motivation
4+
5+
The OpenTelemetry Collector ecosystem lacks a unified approach to configuration management, leading to several problems:
6+
7+
1. **Documentation Drift**: Go configuration structs and documentation exist independently and frequently diverge over time
8+
2. **Inconsistent Developer Experience**: No standardized patterns for defining component configurations
9+
3. **No config validation capabilities**: Lack of JSON schemas prevents autocompletion and validation in configuration editors
10+
11+
## Current state
12+
13+
- Go configuration structs in each component with validation implemented via custom code and defaults set in `setDefaultConfig` functions
14+
- Manual documentation that often becomes outdated
15+
- No standardized JSON schemas for configuration validation
16+
17+
## Desired state
18+
19+
**Goal**: Establish a single source of truth for component configuration that generates:
20+
1. **Go configuration structs** with proper mapstructure tags, validation, and default values.
21+
2. **JSON schemas** for configuration validation and editor autocompletion
22+
3. **Documentation** that stays automatically synchronized with implementation
23+
24+
## Previous and current approaches
25+
26+
### Past attempts
27+
28+
- [Previously available contrib configschema tool](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.102.0/cmd/configschema): Retired due to incompleteness, complexity and maintenance burden. It required dynamic analysis of Go code and pulling all dependencies.
29+
30+
- [PR #27003](https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003): Failed due to trying to cover all corner-cases in the design phase instead of quickly iterating from a simpler approach.
31+
32+
- [PR #10694](https://github.com/open-telemetry/opentelemetry-collector/pull/10694): An attempt to generate config structs from the schema defined in metadata.yaml using github.com/atombender/go-jsonschema. It faced some limitations of the library. However, it was abandoned mostly due to a lack of involvement from the reviewers.
33+
34+
### Current initiatives
35+
36+
- [opentelemetry-collector-contrib/cmd/schemagen/](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/cmd/schemagen): Generates JSON schemas from Go structs with limited support for validation and default values. It uses AST parsing with module-aware loading of dependencies to handle shared libraries.
37+
38+
- [PR #14288](https://github.com/open-telemetry/opentelemetry-collector/pull/14288): Also uses AST parsing to generate JSON schemas from Go structs for the component configurations without using shared config support. Written as part of mdatagen tool.
39+
40+
Parsing Go code to generate schemas is inherently limited. Community consensus recommends reversing the process: generate Go code from schemas instead. There is already widely established practice in other ecosystems to generate go code and documentation for other parts of the OTel Collector.
41+
42+
## Suggested approach
43+
44+
### Overview
45+
46+
This RFC proposes an approach that transitions from the current Go-struct-first model to a schema-first configuration generation system:
47+
48+
1. **Bootstrap Phase**: Use existing `schemagen` tool to generate initial schema specifications
49+
2. **Tool Development Phase**: Create new tooling that generates Go structs, JSON schemas, and documentation from YAML schema specifications
50+
3. **Migration Phase**: Migrate all components to the new schema-first approach
51+
52+
Use of the `schemagen` tool is dictated by the modularity of the Collector components. It allows generating schemas for shared libraries (e.g., scraperhelper) that can be referenced by individual components.
53+
54+
### Reasoning behind this approach
55+
56+
- **Explicit validation**: Schema specifications can explicitly capture validation rules and default config values that cannot be extracted from Go code
57+
- **Rich documentation**: Schemas can include descriptions, examples, and constraints that enhance generated documentation
58+
- **Simplified tooling**: Template-based code generation is more predictable than AST parsing
59+
60+
**Why YAML schema format for the source of truth?**
61+
- **Human-readable**: Easier for component developers to author and maintain than JSON
62+
- **Integration with existing infrastructure**: Natural extension of `metadata.yaml` approach used by `mdatagen` given that it already uses YAML to generate metrics builder configs
63+
- **Extensibility**: YAML allows for custom fields to capture domain-specific configuration and provide escape-hatches to generate config fields that still require custom implementation, validation or default value setters.
64+
65+
### Example schema format
66+
67+
```yaml
68+
config:
69+
allOf:
70+
- $ref: "go.opentelemetry.io/collector/scraper/scraperhelper#/$defs/ControllerConfig"
71+
- $ref: "#/$defs/MetricsBuilderConfig"
72+
properties:
73+
targets:
74+
type: array
75+
items:
76+
type: object
77+
properties:
78+
host:
79+
type: string
80+
description: "Target hostname or IP address"
81+
ping_count:
82+
type: integer
83+
description: "Number of pings to send"
84+
default: 3
85+
ping_interval:
86+
type: string
87+
format: duration
88+
x-customType: "time.Duration"
89+
description: "Interval between pings"
90+
default: "1s"
91+
required: ["host"]
92+
```
93+
94+
`#/$defs/MetricsBuilderConfig` would be automatically generated by mdatagen with the same process used to generate the go structs and documentation today.
95+
96+
`go.opentelemetry.io/collector/scraper/scraperhelper#/$defs/ControllerConfig` would be generated by the new tool from the schema definition in the scraperhelper component.
97+
98+
#### Extensibility
99+
100+
The YAML schema specification can be extended with custom fields (e.g., `x-customType`) to capture domain-specific types and validation rules that are not natively supported in JSON schema. Additionally, we may introduce custom fields that generate fields that will produce references to structs or validation functions that require more complex logic and manual implementation.
101+
102+
### Roadmap
103+
104+
#### Phase 1: Bootstrap initial schemas
105+
106+
**Objective**: Use `schemagen` tool to generate initial schema specifications for all components
107+
108+
**Success Criteria:**
109+
- YAML schemas generated for all components in core and contrib repositories
110+
- Setup CI check to ensure schemas remain up-to-date with Go structs
111+
112+
#### Phase 2: Implement new generation tool
113+
114+
**Objective**: Implement a new tool that takes YAML schema from the user and generates Go structs, combined JSON schema, and documentation per component.
115+
116+
**Success Criteria:**
117+
- New tool generates Go structs that are API-compatible with existing implementations with the following features:
118+
- Parses YAML schema specifications
119+
- Generates Go configuration structs with proper validation
120+
- Produces JSON schemas for config validation
121+
- Creates synchronized documentation
122+
- Generated JSON schemas pass validation tests with real collector configurations
123+
- Generated documentation accurately reflects all configuration options
124+
- Pilot components successfully replace hand-written implementations
125+
126+
If existing config structs don't follow the established naming patterns produced by the generated code, the implementation may allow breaking the Go API compatibility in favor of consistent Go API naming standards and long-term maintainability. However, the configuration file format MUST remain compatible for end users.
127+
128+
#### Phase 3: Migrate all components
129+
130+
**Objective**: Migrate all components to the new tool introduced in Phase 2
131+
132+
**Success Criteria:**
133+
- All core and contrib components migrated to schema-first approach
134+
- All new components use schema-first tooling by default

0 commit comments

Comments
 (0)