Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/utils/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,7 @@
"tracesonmemory",
"tracetranslator",
"tracez",
"transformprocessor",
"triager",
"triagers",
"triaging",
Expand Down
274 changes: 274 additions & 0 deletions docs/rfcs/configuration-merging-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
# Configuration merging

## Background

As part of issue [#8754](https://github.com/open-telemetry/opentelemetry-collector/issues/8754), a new feature gate has been introduced to support merging component lists instead of replacing them ([first PR](https://github.com/open-telemetry/opentelemetry-collector/pull/12097)). This enhancement enables configurations from multiple sources to be combined, preserving all defined components in the final configuration.

More information about this feature can be found in the [confmap README](https://github.com/open-telemetry/opentelemetry-collector/blob/d4539dd6b4e554e15066226fa975b156af7b1510/confmap/README.md#experimental-append-merging-strategy-for-lists).

The main motivation for this change was to allow users to define configuration fragments in different sources, and have them merged in such a way that all specified components are included under `service::pipeline` in the final configuration.

Previously, we relied on Koanf’s default merging strategy, which overrides static values and slices (such as strings, numbers, and lists). This behavior often resulted in configurations being unintentionally overwritten when merged from multiple sources.

This issue has been highlighted in several discussions and feature requests:
- https://github.com/open-telemetry/opentelemetry-collector/issues/8394
- https://github.com/open-telemetry/opentelemetry-collector/issues/8754
- https://github.com/open-telemetry/opentelemetry-collector/issues/10370

## Motivation and Scope

We’ve already implemented a feature gate and foundational logic that supports merging lists across configuration files. Currently, this logic is hardcoded to merge lists only for specific keys: receivers, exporters, and extensions. The relevant implementation can be found [here](https://github.com/open-telemetry/opentelemetry-collector/blob/main/confmap/internal/merge.go).

The current implementation lacks flexibility. Ideally, users should be able to specify which configuration paths should be merged, rather than relying on hardcoded defaults.
This RFC proposes extending the existing functionality by introducing a user-configurable mechanism to define merge behavior.

This RFC builds on top of feedback gathered from the [original PR](https://github.com/open-telemetry/opentelemetry-collector/pull/12097).
More specifically, this RFC aims to:
1. Add an option to specify which configuration paths should be merged.
2. Introduce support for prepend and append operations when merging list values:
- This is good to have for lists that rely on certain ordering, such as:
- `processors`
- `transformprocessor` statements

## Proposed approaches:

### Approach 1 (Recommended): Use yaml tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this over Slack with Vihas but mentioning it here: it would be good to make sure that our officially released Helm charts and operator do not break these YAML tags (they shouldn't but I think it's a good check to do before deciding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reached out on #otel-helm on CNCF slack for more eyes on this. I'll run some tests on this and keep you updated.

Copy link
Contributor Author

@VihasMakwana VihasMakwana Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mx-psi Unfortunately, YAML tags are not preserved in helm chart. I think this is due to the fact that our helm chart deals with yaml to dict conversion internally and tags are lost in this process.

The same applies to the operator. The operator marshals the config into a struct and internally, it is map[string]interface{}. Hence, the tags are lost in the conversion.

That being said, do users need this feature in helm/operator? Our helm chart internally deals with all the merging and appending to the lists, when necessary.

For operator, users deploy a single configuration like following:

kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    exporters:
      debug: {}

    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [debug]
EOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ @mx-psi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 2025-09-10 Collector SIG meeting it seems like this should not be a concern but let's wait a bit more for the Operator SIG to answe


The first approach relies on the concept of [yaml tags](https://tutorialreference.com/yaml/yaml-tags). We can specify a custom tag in our configuration files to indicate the lists we want to merge.

Consider following two configurations:

```yaml
#main.yaml
receivers:
...
exporters:
...
extensions:
extension1:

service:
extensions: [extension1]
pipelines:
logs:
receivers: [...]
exporters: [...]
```

```yaml
#extra.yaml
extensions:
extension2:

service:
extensions: !mode=append [extension2]
```

After running the collector with above configurations, the `service::extensions` path will get merged and final configuration will look like:

```yaml
#final.yaml
receivers:
...
exporters:
...
extensions:
extension1:
extension2:

service:
extensions: [extension1, extension2]
pipelines:
logs:
receivers: [...]
exporters: [...]
```

This approach completely relies on the configuration files and doesn't introduce any command line option.
Internally, we will loop through the yaml tree and fetch the paths we want to merged based on user-defined tags, then merge the lists found at those paths.

Our "custom" yaml tag will be in the format of URI query parameters. For starters, we can support following options:
1. `mode`:
- This setting will control the ordering of merged list.
- The value will be either of `append` or `prepend`.
- Default: `append`
2. `duplicates`:
- This setting controls the duplication of elements in the final list.
- If the user wants to allow duplicates, they can simply turn this flag on.
- Default: `false`
3. `recursive`:
- This setting controls the merging of child elements of the given yaml path.
- This is useful if user wants to merge all the lists under a give sub-tree.
- For eg. merging all the lists under the `service` section.

#### Examples of first approach

1. _Merge the `service::extensions` list_:

```yaml
#main.yaml
receivers:
...
exporters:
...
extensions:
extension1:

service:
extensions: [extension1]
pipelines:
logs:
receivers: [...]
exporters: [...]

---

#extra.yaml
extensions:
extension2:

service:
extensions: !mode=append [extension2]
```

```yaml
#final.yaml
receivers:
...
exporters:
...
extensions:
extension1:
extension2:

service:
extensions: [extension1, extension2]
pipelines:
logs:
receivers: [...]
exporters: [...]
```

2. _Merge all the lists under the `service::*` section_:


```yaml
#main.yaml
receivers:
...
exporters:
...
extensions:
extension1:

service:
extensions: [extension1]
pipelines:
logs:
receivers: [...]
exporters: [...]

---

#extra.yaml
extensions:
extension2:
receiver:
receiver2:

service: !mode=append&recursive=true
extensions: [extension2]
pipelines:
logs:
receivers: [receiver2]
```

```yaml
#final.yaml
receivers:
...
receiver2:
exporters:
...
extensions:
extension1:
extension2:

service:
extensions: [extension1, extension2]
pipelines:
logs:
receivers: [..., receiver2]
exporters: [...]
```

### Approach 2: URI params

The proposed approach will rely on concept of URI query parameters([_RFC 3986_](https://datatracker.ietf.org/doc/html/rfc3986#page-23)). Our configuration URIs already adhere to this syntax and we can extend it to support query params instead adding new CLI flags.

For now, the new merging strategy is only enabled under `confmap.enableMergeAppendOption` gate. If user specifies the options and tries to run the collector without gate, we will merge as per default behaviour.

We will support new parameters to config URIs as follows:
1. `merge_paths`: A comma-separated list of glob patterns which will be used while config merging
- This setting will control the paths user wants to merge from the given config.
- Example:
- `otelcol --config main.yaml --config extra.yaml?merge_paths=service::extensions,service::**::receivers`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the query params is fine. However, we need to keep in mind that some confmap providers use the same pattern to pass extra information. With this approach, they won't be able to use these keys which is probably fine.

If anyone else have any other ideas, please share.

cc @open-telemetry/collector-approvers

Copy link
Contributor Author

@VihasMakwana VihasMakwana Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can extract the parameters here

for _, uri := range mr.uris {
and process them for our merge function.

Then we can remove our parameters from URI and let resolver do its job:

u, _ := url.Parse("otel.yaml?key=val&merge_mode=append")
q := u.Query()
q.Del("merge_mode")
u.RawQuery = q.Encode()
// uri will now be otel.yaml?key=val&key2=val2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in this RFC and came up with a few issues with query parameters that we'll want to consider before going that route.

@dmitryax I believe you, @mx-psi, and I had an offline conversation at KubeCon and tacitly agreed we should go the route of configuring these inside a config file (this option in the RFC). Do you still think we should go this route? I think we should probably at least explore it before deciding on an approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evan-bradley @dmitryax approach 1 does not deal with params, does this resolve your concerns here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind calling out the "Configuring Confmap Providers" RFC just to make it clear there are other options we can explore, but if approach 1 is our preference, I'm good with what's here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fair

- In this example, we will merge the list of extensions and receivers from pipeline, excluding lists in the rest of the config.
- `otelcol --config main.yaml --config ext.yaml?merge_paths=service::extensions --config rec.yaml?merge_paths=service::**::receivers`
- In this example, we will merge all list of extensions from `ext.yml` and list of receivers from `rec.yaml`, excluding lists in the rest of the config.
2. `merge_mode`: One of `prepend` or `append`.
Copy link
Member

@dmitryax dmitryax Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need more options. For example, if I want to add a receiver in a merging config while it may be already defined in the base config. Also, we need the default replace behavior. What about the following list:

  • append: append entries to the list
  • append_unique: append only unique entries to the list
  • prepend: insert entries in front of the list
  • prepend_unique: insert only unique entries in front of the list
    replace (default): overwrite the list
    keep: keep the original list

Also, I would call the options list_merge_mode and list_merge_paths to make it clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax I see. I'll update the RFC

Copy link
Member

@douglascamata douglascamata Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you touch a good point there with the _unique modifiers, @dmitryax. Shouldn't both append and prepend always ensure lists containing only "basic types" (strings, ints, etc) do not have duplicated values? I don't know about any scenario in the Collector involving lists of basic types where duplicated values wouldn't be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For service components, duplicates will cause error.
I'm not sure about other places though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For service components, duplicates will cause error.

Right. For example, I have a list of receivers in a basic config file which I don't control. Then, I want to add another receiver and override it in case it's already defined in the basic config. In that case --config=extra_receiver.yaml?merge_mode=append&merge_paths=service::**::receivers would cause a failure if the basic config already has my receiver. I need append_unique instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with a small number of options, in any case we can discuss that in the implementation phase

- This setting will control the ordering of merged list.

#### Examples of second approach

Here are some examples:

1. _Append to default mergeable components_:
```bash
otelcol --config=main.yaml --config=extra_components.yaml?merge_mode=append --feature-gates=confmap.enableMergeAppendOption
```

- After running above command, the final configuration will include:
- Merged component(s) (`receivers`, `exporters` and `extensions`) from `extra_components.yaml`

2. _Specify exact paths for merging_:
```bash
otelcol \
--config=main.yaml \
--config=extra_extension.yaml?merge_mode=append&merge_paths=service::extensions \
--config=extra_receiver.yaml?merge_mode=append&merge_paths=service::**::receivers \
--feature-gates=confmap.enableMergeAppendOption
```

- After running above command, the final configuration will include:
- Merged extension(s) from `extra_extension.yaml`
- Merged receiver(s) from `extra_receiver.yaml`


3. _Prepend processors_:
```bash
otelcol --config=main.yaml --config=extra_processor.yaml?merge_mode=prepend&merge_paths=service::**::processors --feature-gates=confmap.enableMergeAppendOption
```

- After running above command, the final configuration will include:
- Merged processor(s) from `extra_processor.yaml`, but prepend the existing list.

4. _Exclude a config file from lists merging process_:
```bash
otelcol --config=main.yaml --config=extra_components.yaml?merge_mode=append --config override_components.yaml --feature-gates=confmap.enableMergeAppendOption
```

- In the above command, we have no specified any options for `override_components.yaml`. Hence, it will override all the conflicting lists from previous configuration, which is the default behaviour.

## Open questions

- What to do if an invalid option is provided for `merge_mode` or `merge_paths`?
- I can think of two possibilities:
1. Error out.
2. Log an error and merge the default way
- What to do if an invalid query param is provided in config URI?
- In this case, I strongly feel that we should error out.

## Extensibility

This URI-based approach is highly extensible. In the future, it can enable advanced operations such as map overriding. Currently, it's impossible to do so.
Loading