Skip to content

Commit b380e4a

Browse files
committed
[chore] Add an RFC for Component Configuration Management Roadmap
Replaces #13784 This RFC proposes a roadmap for introducing configuration schemas to OpenTelemetry Collector components. It establishes a schema-first approach in which Go structs, JSON schemas, and documentation are all generated from a single YAML source of truth. This RFC is the result of discussions among contributors involved in this effort: - @atoulme - @evan-bradley - @jkoronaAtCisco - @mx-psi - @pavolloffay Related Issues / PRs: - open-telemetry/opentelemetry-collector-contrib#42214 - #9769 - #14288 - open-telemetry/opentelemetry-collector-contrib#27003
1 parent 52935f0 commit b380e4a

File tree

2 files changed

+133
-0
lines changed

2 files changed

+133
-0
lines changed

.github/workflows/utils/cspell.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@
148148
"configparser",
149149
"configretry",
150150
"configrpc",
151+
"configschema",
151152
"configsource",
152153
"configtelemetry",
153154
"configtest",
@@ -425,6 +426,7 @@
425426
"sarama",
426427
"sattributes",
427428
"sattributesprocessor",
429+
"schemagen",
428430
"scrapererror",
429431
"scraperhelper",
430432
"scrapertest",
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Component Configuration Management Roadmap
2+
3+
## Motivation
4+
5+
The OpenTelemetry Collector ecosystem lacks a unified approach to configuration management, leading to several problems:
6+
7+
1. **Documentation Drift**: Go configuration structs and documentation exist independently and frequently diverge over time
8+
2. **Inconsistent Developer Experience**: No standardized patterns for defining component configurations
9+
3. **No config validation capabilities**: Lack of JSON schemas prevents autocompletion and validation in configuration editors
10+
11+
## Current state
12+
13+
- Go configuration structs in each component with validation implemented via custom code and defaults set in `setDefaultConfig` functions
14+
- Manual documentation that often becomes outdated
15+
- No standardized JSON schemas for configuration validation
16+
17+
## Desired state
18+
19+
**Goal**: Establish a single source of truth for component configuration that generates:
20+
1. **Go configuration structs** with proper mapstructure tags, validation, and default values.
21+
2. **JSON schemas** for configuration validation and editor autocompletion
22+
3. **Documentation** that stays automatically synchronized with implementation
23+
24+
## Previous and current approaches
25+
26+
### Past attempts
27+
28+
- [Previously available contrib configschema tool](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.102.0/cmd/configschema): Retired due to incompleteness, complexity and maintenance burden. It required dynamic analysis of Go code and pulling all dependencies.
29+
30+
- [PR #27003](https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003): Failed due to trying to cover all corner-cases in the design phase instead of quickly iterating from a simpler approach.
31+
32+
### Current initiatives
33+
34+
- ([opentelemetry-collector-contrib/cmd/schemagen/](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/cmd/schemagen)): Generates JSON schemas from Go structs with limited support for validation and default values. It uses AST parsing with module-aware loading of dependencies to handle shared libraries.
35+
36+
- ([PR #14288](https://github.com/open-telemetry/opentelemetry-collector/pull/14288)): Also uses AST parsing to generate JSON schemas from Go structs for the component configurations without using shared config support. Written as part of mdatagen tool.
37+
38+
Parsing Go code to generate schemas is inherently limited. Community consensus recommends reversing the process: generate Go code from schemas instead. There is already widely established practice in other ecosystems to generate go code and documentation for other parts of the OTel Collector.
39+
40+
## Suggested approach
41+
42+
### Overview
43+
44+
This RFC proposes an approach that transitions from the current Go-struct-first model to a schema-first configuration generation system:
45+
46+
1. **Bootstrap Phase**: Use existing `schemagen` tool to generate initial schema specifications
47+
2. **Tool Development Phase**: Create new tooling that generates Go structs, JSON schemas, and documentation from YAML schema specifications
48+
3. **Migration Phase**: Migrate all components to the new schema-first approach
49+
50+
Use of the `schemagen` tool is dictated by the modularity of the Collector components. It allows generating schemas for shared libraries (e.g., scraperhelper) that can be referenced by individual components.
51+
52+
### Reasoning behind this approach
53+
54+
- **Explicit validation**: Schema specifications can explicitly capture validation rules and default config values that cannot be extracted from Go code
55+
- **Rich documentation**: Schemas can include descriptions, examples, and constraints that enhance generated documentation
56+
- **Simplified tooling**: Template-based code generation is more predictable than AST parsing
57+
58+
**Why YAML schema format for the source of truth?**
59+
- **Human-readable**: Easier for component developers to author and maintain than JSON
60+
- **Integration with existing infrastructure**: Natural extension of `metadata.yaml` approach used by `mdatagen` given that it already uses YAML to generate metrics builder configs
61+
- **Extensibility**: YAML allows for custom tags and types to capture domain-specific configuration
62+
63+
### Example schema format
64+
65+
```yaml
66+
config:
67+
allOf:
68+
- $ref: "go.opentelemetry.io/collector/scraper/scraperhelper#/$defs/ControllerConfig"
69+
- $ref: "#/$defs/MetricsBuilderConfig"
70+
properties:
71+
targets:
72+
type: array
73+
items:
74+
type: object
75+
properties:
76+
host:
77+
type: string
78+
description: "Target hostname or IP address"
79+
ping_count:
80+
type: integer
81+
description: "Number of pings to send"
82+
default: 3
83+
ping_interval:
84+
type: string
85+
format: duration
86+
description: "Interval between pings"
87+
default: "1s"
88+
required: ["host"]
89+
```
90+
91+
`#/$defs/MetricsBuilderConfig` would be automatically generated by mdatagen with the same process used to generate the go structs and documentation today.
92+
93+
`go.opentelemetry.io/collector/scraper/scraperhelper#/$defs/ControllerConfig` would be generated by the new tool from the schema definition in the scraperhelper component.
94+
95+
#### Extensibility
96+
97+
The YAML schema specification can be extended with custom tags (e.g., `x-customType`) to capture domain-specific types and validation rules that are not natively supported in JSON schema. Additionally, we may introduce custom tags that generate fields that will produce references to structs or validation functions that require more complex logic and manual implementation.
98+
99+
### Roadmap
100+
101+
#### Phase 1: Bootstrap initial schemas
102+
103+
**Objective**: Use `schemagen` tool to generate initial schema specifications for all components
104+
105+
**Success Criteria:**
106+
- YAML schemas generated for all components in core and contrib repositories
107+
- Setup CI check to ensure schemas remain up-to-date with Go structs
108+
109+
#### Phase 2: Implement new generation tool
110+
111+
**Objective**: Implement a new tool that takes YAML schema from the user and generates Go structs, combined JSON schema, and documentation per component.
112+
113+
**Success Criteria:**
114+
- New tool generates Go structs that are API-compatible with existing implementations with the following features:
115+
- Parses YAML schema specifications
116+
- Generates Go configuration structs with proper validation
117+
- Produces JSON schemas for config validation
118+
- Creates synchronized documentation
119+
- Generated JSON schemas pass validation tests with real collector configurations
120+
- Generated documentation accurately reflects all configuration options
121+
- Pilot components successfully replace hand-written implementations
122+
123+
If existing config structs don't follow the established naming patterns produced by the generated code, the implementation may allow breaking the Go API compatibility in favor of consistent Go API naming standards and long-term maintainability. However, the configuration file format MUST remain compatible for end users.
124+
125+
#### Phase 3: Migrate all components
126+
127+
**Objective**: Migrate all components to the new tool introduced in Phase 2
128+
129+
**Success Criteria:**
130+
- All core and contrib components migrated to schema-first approach
131+
- All new components use schema-first tooling by default

0 commit comments

Comments
 (0)