Description
The purpose and use-cases of the new component
I would like to propose a new processor for partitioning batches using OTTL. The processor would be configured with a mapping of OTTL expressions to partitioning keys, and:
- Infer the relevant context from the expressions (e.g. request, resource, log, datapoint, etc.)
- Evaluate the expressions against each batch
- Partition the batch according to the partitioning keys
- Send each batch partition to the next consumer, add the partitioning keys & values to request metadata
Some use cases below:
Topic routing and message key/partitioning for message broker exporters
The idea for this came from a PoC I implemented for routing and partitioning data before exporting to Kafka -- see #38888 (comment). This is intended to address several issues:
- At the moment the Kafka exporter supports configuration for determining a topic from a resource attribute, but this does not behave correctly with multiple resources - see topic_from_attribute does not work as expected #37470
- the exporter supports partitioning traces by trace_id, but does not support the same for logs
- the exporter does not support partitioning by arbitrary attributes -- see [exporter/kafkaexporter] Enable Partitioning by Specific Attribute for Logs & Metrics #38484
It's just a matter of time before users ask for the same functionality in other exporters, e.g. Pulsar.
Batching by arbitrary data and metadata attributes
In open-telemetry/opentelemetry-collector#10825 it has been suggested that the exporterhelper batch sender should be enhanced with support for batching by request metadata or by pdata attribute. The former case is simple, while the latter tends towards OTTL, so I think this would be best covered by a separate processor like this one.
General purpose "groupby" processor
Finally, this processor would provide arbitrary "group by" functionality, which could enable replacing existing processors:
- we could replace the https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor processor by using the new partitioning processor followed by the transform processor to set resource attributes from the partitioning key
- we could replace the https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbytraceprocessor processor by using the new partitioning processor followed by a new "buffering" processor, i.e. one that buffers batches by some request metadata keys and emits them after a configurable duration
(Whether we want to replace them is another matter, just saying it's a possible use case.)
Example configuration for the component
processors:
partitioner:
keys:
logs_topic: Concat(request.metadata["X-Tenant-Id"]).otlp_logs
logs_message_key: log.trace_id
In this example, requests would be partitioned by the X-Tenant-Id client request header, and data would further be partitioned by trace ID. Each partitioned batch would include the request metadata logs_topic
and logs_message_key
.
Telemetry data types supported
Logs, metrics, traces. When OTTL supports profiles, profiles too.
Code Owner(s)
axw
Sponsor (optional)
No response
Additional context
No response