Skip to content

feat: add content_router processor#2030

Merged
lquerel merged 18 commits intoopen-telemetry:mainfrom
lalitb:routing_processor
Feb 19, 2026
Merged

feat: add content_router processor#2030
lquerel merged 18 commits intoopen-telemetry:mainfrom
lalitb:routing_processor

Conversation

@lalitb
Copy link
Copy Markdown
Member

@lalitb lalitb commented Feb 12, 2026

Description:

Implements the content router processor described in #2029

  • Registered as urn:otel:content_router:processor
  • Zero-copy routing via RawLogsData/RawMetricsData/RawTraceData protobuf views
  • Native OtapLogsView for Arrow logs (metrics/traces Arrow views pending — falls back to OTLP conversion)
  • Destination-aware mixed-batch detection with single-pass fold

Implements a content-based routing processor (urn:otel:content_router:processor)
that routes telemetry signals to named output ports based on a resource attribute
value. Designed for multi-tenant pipelines where each tenant's data goes to a
dedicated exporter based on e.g. microsoft.resourceId.

Key features:
- Zero-copy routing via RawLogsData/RawMetricsData/RawTraceData protobuf views
- Native OtapLogsView for Arrow logs (no OTLP round-trip)
- Single-pass fold for multi-resource batch consistency (no per-batch allocation)
- Destination-aware mixed-batch detection with permanent NACK
- Configurable: routing_key, routes map, default_output, case_sensitive
- Comprehensive config validation (empty values, case-insensitive collisions)
- Telemetry metrics: received, routed, routed_default, nacked, no_routing_key,
  conversion_error
- 27 tests covering config, routing, metrics/traces/logs, mixed batches,
  Arrow ConversionError, default-output, and NACK behavior
…atch, non-string attr, and Matched+NoMatch mixed batch
@lalitb lalitb requested a review from a team as a code owner February 12, 2026 12:39
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Feb 12, 2026
@lalitb lalitb marked this pull request as draft February 12, 2026 12:39
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 94.23913% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.96%. Comparing base (30f4786) to head (437de1d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2030      +/-   ##
==========================================
+ Coverage   86.92%   86.96%   +0.03%     
==========================================
  Files         535      536       +1     
  Lines      171591   172511     +920     
==========================================
+ Hits       149157   150019     +862     
- Misses      21900    21958      +58     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 89.10% <94.23%> (+0.03%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 90.31% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
quiver 91.73% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lalitb lalitb marked this pull request as ready for review February 12, 2026 14:07
Copy link
Copy Markdown
Member

@albertlockett albertlockett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lalitb

//! ```
//!
//! Each route value corresponds to a named output port that must be wired
//! in the pipeline configuration.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we add a sentence or two to the module docs here clarifying the behaviour for unmatched or mix-matched batches?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - added a section to the module docs covering both cases.

On mixed batches specifically: we intentionally NACK the entire batch rather than splitting it per-resource (which is what the Go collector's routingconnector does). Splitting requires deserializing, partitioning the resource arrays, and re-serializing separate payloads per destination - which would destroy the zero-copy advantage we get from routing on protobuf views. Batch-level routing keeps it to a single-pass fold over resource attributes with O(1) forwarding of the untouched payload.

In practice, mixed batches should be rare - SDKs typically produce batches from a single resource, and upstream collectors should already be grouping by.tenant. The permanent NACK surfaces it as a configuration issue to fix upstream rather than silently absorbing the splitting cost. If splitting were ever needed, it would belong in a separate upstream processor, not in the router.

Copy link
Copy Markdown
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, we do want the best performance we can have for a pure-OTLP pipeline.

// Use native OTAP Arrow view for logs (avoids clone + OTLP round-trip)
SignalType::Logs => self.resolve_arrow_logs_route(arrow_records),
// Metrics/Traces Arrow views not yet available — convert to OTLP.
// TODO: Use OtapMetricsView/OtapTracesView when available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a GH issue for that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't had, created now - #2053

@jmacd jmacd added this pull request to the merge queue Feb 13, 2026
github-merge-queue Bot pushed a commit that referenced this pull request Feb 13, 2026
Description:

   Implements the content router processor described in #2029

   - Registered as `urn:otel:content_router:processor`
- Zero-copy routing via `RawLogsData`/`RawMetricsData`/`RawTraceData`
protobuf views
- Native `OtapLogsView` for Arrow logs (metrics/traces Arrow views
pending — falls back to OTLP conversion)
   - Destination-aware mixed-batch detection with single-pass fold
Comment thread rust/otap-dataflow/crates/otap/src/content_router.rs Outdated
@utpilla utpilla removed this pull request from the merge queue due to a manual request Feb 13, 2026
use std::sync::Arc;

/// URN for the ContentRouter processor
pub const CONTENT_ROUTER_URN: &str = "urn:otel:content_router:processor";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name content_router feels too generic. It doesn't communicate what about the content drives routing. Something like resource_attribute_router would be more descriptive for what this processor currently does.

However, before we settle on a name, I think it's worth discussing the broader routing story. Ideally, for multi-tenant routing, we'd want to extract the tenant identifier from HTTP headers or gRPC metadata. That way we can route without parsing the payload at all, which would be significantly cheaper.

That raises a design question: Should we have one routing component that can route based on either headers or resource attributes (configured via the routing key source), or separate processors for each (e.g., header_router, resource_attribute_router)? The naming of this component woul depend on which direction we take.

Copy link
Copy Markdown
Member Author

@lalitb lalitb Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great questions - let me share my thinking on both:

On the name: I'd prefer to keep content_router. I agree it sounds generic if you only look at what it does today (resource attributes), but that's intentional. We can extend it in the future to support other content-level keys - scope attributes, or even context metadata stamped by the receiver. The name leaves room for that without a rename. resource_attribute_router would lock us in.

On header-based routing: Architecturally, a separate header_router processor can't exist in our design -headers and gRPC metadata are consumed at the transport layer by the receiver. By the time a processor sees the message, it only has OtapPdata. If we ever needed header values for routing, the receiver would have to stamp them into the Context struct that travels with each message, and then content_router could read them. So it would naturally be a config option within this same component, not a separate processor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever needed header values for routing, the receiver would have to stamp them into the Context struct that travels with each message, and then content_router could read them. So it would naturally be a config option within this same component, not a separate processor.

Yes, I think we should consider doing that. it would be more efficient to look up the Context instead of iterating over resource attributes on the hot path. So, it sounds like we're aligned on having this same processor extend its capabilities over time. I prefer that too.

In that case, I'd suggest we drop the word "content" and simply call it the router processor. Just like how we use retry and batch processors and not content_retry or content_batch processors.

Copy link
Copy Markdown
Member Author

@lalitb lalitb Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'd suggest we drop the word "content" and simply call it the router processor. Just like how we use retry and batch processors and not content_retry or content_batch processors.

We already have signal_type_router in the codebase - calling this one router would be ambiguous (router based on what?). The prefix here disambiguates: signal_type_router routes by signal type, content_router routes by content values.

Also, regarding naming - content-based routing - CBR is a well-established messaging pattern - a router that inspects message content to determine the destination. Naming for retry and batch processors make sense - but here we have two different types of routers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty good reasoning, even if the link to "CBR" looks to be from the 1990s😀.

@lquerel please take a look.

On this thread, I think we should rename signal_type_router to signal_router.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URN of the signal type router is already urn:otel:type_router:processor, so I would be in favor of using the URN urn:otel:content_router:processor for this new processor. We should also rename the file signal_type_router.rs to type_router.rs for consistency.

I'm still reviewing this PR.

Copy link
Copy Markdown
Member Author

@lalitb lalitb Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the URN - urn:otel:content_router:processor is already in place. On the renaming of signal_type_router.rs -> type_router.rs: happy to do this, but would prefer to keep it out of this PR to keep the diff focused. Can follow up in a separate cleanup PR.

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the PR, but I think there are a few minor improvements to make. I trust you to address them before merging.

Comment thread rust/otap-dataflow/crates/otap/src/content_router.rs Outdated
Comment thread rust/otap-dataflow/crates/otap/src/content_router.rs Outdated
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContentRouterConfig {
/// The resource attribute key used for routing decisions.
pub routing_key: String,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we may extend the scope of this router, I think we should find a way to make this routing key more explicit. It is currently a routing key that targets resource attributes. The question we should ask is how we would evolve this configuration to support routing based on scope attributes or signal attributes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or metric name)!

Copy link
Copy Markdown
Member Author

@lalitb lalitb Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - introduced a RoutingKeyExpr enum to make the source explicit and extensible:

pub enum RoutingKeyExpr {
    ResourceAttribute(String),                                                                                                                                                                                     
    // Future: ScopeAttribute(String), MetricName, ...
}

The config format is now routing_key: { resource_attribute: "service.namespace" }. Adding support for scope attributes, signal attributes, or metric names (👍 @jmacd) only requires a new variant and a corresponding match arm in resolve_route.

Comment thread rust/otap-dataflow/crates/otap/src/content_router.rs Outdated
// Use native OTAP Arrow view for logs (avoids clone + OTLP round-trip)
SignalType::Logs => self.resolve_arrow_logs_route(arrow_records),
// Metrics/Traces Arrow views not yet available — convert to OTLP.
// TODO: Use OtapMetricsView/OtapTracesView when available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a GH issue for that?

@lalitb
Copy link
Copy Markdown
Member Author

lalitb commented Feb 17, 2026

I approve the PR, but I think there are a few minor improvements to make. I trust you to address them before merging.

Thanks, comments are relevant. Have addressed them.

@lalitb
Copy link
Copy Markdown
Member Author

lalitb commented Feb 19, 2026

@lquerel - do you think this can be merged now ?

@lquerel lquerel added this pull request to the merge queue Feb 19, 2026
Merged via the queue into open-telemetry:main with commit 74b09ca Feb 19, 2026
62 checks passed
antonmry pushed a commit to antonmry/otel-arrow that referenced this pull request Feb 23, 2026
Description:

   Implements the content router processor described in open-telemetry#2029

   - Registered as `urn:otel:content_router:processor`
- Zero-copy routing via `RawLogsData`/`RawMetricsData`/`RawTraceData`
protobuf views
- Native `OtapLogsView` for Arrow logs (metrics/traces Arrow views
pending — falls back to OTLP conversion)
   - Destination-aware mixed-batch detection with single-pass fold

---------

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants