feat: add content_router processor#2030
Conversation
Implements a content-based routing processor (urn:otel:content_router:processor) that routes telemetry signals to named output ports based on a resource attribute value. Designed for multi-tenant pipelines where each tenant's data goes to a dedicated exporter based on e.g. microsoft.resourceId. Key features: - Zero-copy routing via RawLogsData/RawMetricsData/RawTraceData protobuf views - Native OtapLogsView for Arrow logs (no OTLP round-trip) - Single-pass fold for multi-resource batch consistency (no per-batch allocation) - Destination-aware mixed-batch detection with permanent NACK - Configurable: routing_key, routes map, default_output, case_sensitive - Comprehensive config validation (empty values, case-insensitive collisions) - Telemetry metrics: received, routed, routed_default, nacked, no_routing_key, conversion_error - 27 tests covering config, routing, metrics/traces/logs, mixed batches, Arrow ConversionError, default-output, and NACK behavior
…de_config in create_content_router
…atch, non-string attr, and Matched+NoMatch mixed batch
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2030 +/- ##
==========================================
+ Coverage 86.92% 86.96% +0.03%
==========================================
Files 535 536 +1
Lines 171591 172511 +920
==========================================
+ Hits 149157 150019 +862
- Misses 21900 21958 +58
Partials 534 534
🚀 New features to boost your workflow:
|
…nicode correctness)
| //! ``` | ||
| //! | ||
| //! Each route value corresponds to a named output port that must be wired | ||
| //! in the pipeline configuration. |
There was a problem hiding this comment.
nit: Should we add a sentence or two to the module docs here clarifying the behaviour for unmatched or mix-matched batches?
There was a problem hiding this comment.
Good call - added a section to the module docs covering both cases.
On mixed batches specifically: we intentionally NACK the entire batch rather than splitting it per-resource (which is what the Go collector's routingconnector does). Splitting requires deserializing, partitioning the resource arrays, and re-serializing separate payloads per destination - which would destroy the zero-copy advantage we get from routing on protobuf views. Batch-level routing keeps it to a single-pass fold over resource attributes with O(1) forwarding of the untouched payload.
In practice, mixed batches should be rare - SDKs typically produce batches from a single resource, and upstream collectors should already be grouping by.tenant. The permanent NACK surfaces it as a configuration issue to fix upstream rather than silently absorbing the splitting cost. If splitting were ever needed, it would belong in a separate upstream processor, not in the router.
jmacd
left a comment
There was a problem hiding this comment.
Very nice, we do want the best performance we can have for a pure-OTLP pipeline.
| // Use native OTAP Arrow view for logs (avoids clone + OTLP round-trip) | ||
| SignalType::Logs => self.resolve_arrow_logs_route(arrow_records), | ||
| // Metrics/Traces Arrow views not yet available — convert to OTLP. | ||
| // TODO: Use OtapMetricsView/OtapTracesView when available. |
There was a problem hiding this comment.
Do we have a GH issue for that?
Description: Implements the content router processor described in #2029 - Registered as `urn:otel:content_router:processor` - Zero-copy routing via `RawLogsData`/`RawMetricsData`/`RawTraceData` protobuf views - Native `OtapLogsView` for Arrow logs (metrics/traces Arrow views pending — falls back to OTLP conversion) - Destination-aware mixed-batch detection with single-pass fold
| use std::sync::Arc; | ||
|
|
||
| /// URN for the ContentRouter processor | ||
| pub const CONTENT_ROUTER_URN: &str = "urn:otel:content_router:processor"; |
There was a problem hiding this comment.
The name content_router feels too generic. It doesn't communicate what about the content drives routing. Something like resource_attribute_router would be more descriptive for what this processor currently does.
However, before we settle on a name, I think it's worth discussing the broader routing story. Ideally, for multi-tenant routing, we'd want to extract the tenant identifier from HTTP headers or gRPC metadata. That way we can route without parsing the payload at all, which would be significantly cheaper.
That raises a design question: Should we have one routing component that can route based on either headers or resource attributes (configured via the routing key source), or separate processors for each (e.g., header_router, resource_attribute_router)? The naming of this component woul depend on which direction we take.
There was a problem hiding this comment.
Great questions - let me share my thinking on both:
On the name: I'd prefer to keep content_router. I agree it sounds generic if you only look at what it does today (resource attributes), but that's intentional. We can extend it in the future to support other content-level keys - scope attributes, or even context metadata stamped by the receiver. The name leaves room for that without a rename. resource_attribute_router would lock us in.
On header-based routing: Architecturally, a separate header_router processor can't exist in our design -headers and gRPC metadata are consumed at the transport layer by the receiver. By the time a processor sees the message, it only has OtapPdata. If we ever needed header values for routing, the receiver would have to stamp them into the Context struct that travels with each message, and then content_router could read them. So it would naturally be a config option within this same component, not a separate processor.
There was a problem hiding this comment.
If we ever needed header values for routing, the receiver would have to stamp them into the Context struct that travels with each message, and then content_router could read them. So it would naturally be a config option within this same component, not a separate processor.
Yes, I think we should consider doing that. it would be more efficient to look up the Context instead of iterating over resource attributes on the hot path. So, it sounds like we're aligned on having this same processor extend its capabilities over time. I prefer that too.
In that case, I'd suggest we drop the word "content" and simply call it the router processor. Just like how we use retry and batch processors and not content_retry or content_batch processors.
There was a problem hiding this comment.
In that case, I'd suggest we drop the word "content" and simply call it the router processor. Just like how we use retry and batch processors and not content_retry or content_batch processors.
We already have signal_type_router in the codebase - calling this one router would be ambiguous (router based on what?). The prefix here disambiguates: signal_type_router routes by signal type, content_router routes by content values.
Also, regarding naming - content-based routing - CBR is a well-established messaging pattern - a router that inspects message content to determine the destination. Naming for retry and batch processors make sense - but here we have two different types of routers.
There was a problem hiding this comment.
I think this is pretty good reasoning, even if the link to "CBR" looks to be from the 1990s😀.
@lquerel please take a look.
On this thread, I think we should rename signal_type_router to signal_router.
There was a problem hiding this comment.
The URN of the signal type router is already urn:otel:type_router:processor, so I would be in favor of using the URN urn:otel:content_router:processor for this new processor. We should also rename the file signal_type_router.rs to type_router.rs for consistency.
I'm still reviewing this PR.
There was a problem hiding this comment.
Agreed on the URN - urn:otel:content_router:processor is already in place. On the renaming of signal_type_router.rs -> type_router.rs: happy to do this, but would prefer to keep it out of this PR to keep the diff focused. Can follow up in a separate cleanup PR.
Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
lquerel
left a comment
There was a problem hiding this comment.
I approve the PR, but I think there are a few minor improvements to make. I trust you to address them before merging.
| #[derive(Debug, Clone, Serialize, Deserialize)] | ||
| pub struct ContentRouterConfig { | ||
| /// The resource attribute key used for routing decisions. | ||
| pub routing_key: String, |
There was a problem hiding this comment.
Since we may extend the scope of this router, I think we should find a way to make this routing key more explicit. It is currently a routing key that targets resource attributes. The question we should ask is how we would evolve this configuration to support routing based on scope attributes or signal attributes.
There was a problem hiding this comment.
Good point - introduced a RoutingKeyExpr enum to make the source explicit and extensible:
pub enum RoutingKeyExpr {
ResourceAttribute(String),
// Future: ScopeAttribute(String), MetricName, ...
}The config format is now routing_key: { resource_attribute: "service.namespace" }. Adding support for scope attributes, signal attributes, or metric names (👍 @jmacd) only requires a new variant and a corresponding match arm in resolve_route.
| // Use native OTAP Arrow view for logs (avoids clone + OTLP round-trip) | ||
| SignalType::Logs => self.resolve_arrow_logs_route(arrow_records), | ||
| // Metrics/Traces Arrow views not yet available — convert to OTLP. | ||
| // TODO: Use OtapMetricsView/OtapTracesView when available. |
There was a problem hiding this comment.
Do we have a GH issue for that?
Thanks, comments are relevant. Have addressed them. |
|
@lquerel - do you think this can be merged now ? |
Description: Implements the content router processor described in open-telemetry#2029 - Registered as `urn:otel:content_router:processor` - Zero-copy routing via `RawLogsData`/`RawMetricsData`/`RawTraceData` protobuf views - Native `OtapLogsView` for Arrow logs (metrics/traces Arrow views pending — falls back to OTLP conversion) - Destination-aware mixed-batch detection with single-pass fold --------- Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
Description:
Implements the content router processor described in #2029
urn:otel:content_router:processorRawLogsData/RawMetricsData/RawTraceDataprotobuf viewsOtapLogsViewfor Arrow logs (metrics/traces Arrow views pending — falls back to OTLP conversion)