-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Component donation: Span Pruning Processor #45654
Description
Building and distributing outside this repository
- I understand I can use and distribute my component outside of this repository.
Existing component implementation
https://github.com/grafana/opentelemetry-collector-extras
Components covering similar use cases
| Component | Relationship |
|---|---|
| tailsamplingprocessor | Reduces trace volume by sampling complete traces based on policies. Span pruning is complementary—it reduces volume within traces by aggregating repetitive spans rather than dropping entire traces. |
| filterprocessor | Drops entire spans/metrics/logs matching criteria. Span pruning preserves information by aggregating rather than dropping. |
| groupbytraceprocessor | Groups spans by trace ID (often used upstream of tail sampling). Span pruning operates on complete traces and would typically be placed after groupbytrace. |
| transformprocessor | Modifies telemetry using OTTL. Could theoretically implement some aggregation logic, but span pruning provides purpose-built aggregation with statistics. |
No direct equivalent exists for aggregating similar leaf spans within a trace while preserving statistical information.
The purpose and use-cases of the component
Problem Statement
Modern distributed applications often generate traces with many repetitive leaf spans—N+1 database queries, batch HTTP calls, fan-out operations. These patterns create:
- High storage costs from redundant span data
- Noisy trace views that obscure the actual request flow
- Cardinality pressure on backends
Solution
The spanpruningprocessor identifies similar leaf spans within a trace and replaces each group with a single summary span containing aggregated statistics. This reduces trace volume while preserving meaningful operational information.
Key Features
| Feature | Description |
|---|---|
| Leaf span aggregation | Groups spans by name, kind, status, parent, and configurable attributes (glob patterns supported) |
| Recursive parent aggregation | When all children of a parent are aggregated, the parent can also be aggregated (configurable depth) |
| Statistical summaries | Summary spans include count, min/max/avg/total duration, and histogram buckets |
| Outlier detection | IQR or MAD-based detection identifies slow spans; optionally correlates outliers with attribute values |
| Outlier preservation | Keeps outlier spans as individuals for debugging while aggregating normal spans |
| Attribute loss tracking | Optional metrics and annotations showing what attribute diversity is lost during aggregation |
Use Cases
- Database query optimization - Aggregate N+1 query patterns into a single summary showing query count and latency distribution
- Batch operations - Consolidate fan-out calls (parallel HTTP requests, message publishes) into representative summaries
- Cost reduction - Reduce trace storage costs for repetitive workloads while retaining operational insight
- Debugging support - Preserve outlier spans with correlated attributes to identify root causes of latency issues
Example
Before (10 similar SELECT spans):
handler
├── SELECT - 5ms
├── SELECT - 6ms
├── SELECT - 7ms
├── SELECT - 500ms (outlier, cache_hit=false)
└── ... (6 more)
After (with outlier preservation):
handler
├── SELECT (summary, span_count=9)
│ ├── aggregation.duration_avg_ns: 8000000
│ ├── aggregation.duration_median_ns: 7000000
│ ├── aggregation.outlier_correlated_attributes: "cache_hit=false(100%/0%)"
│ └── aggregation.histogram_bucket_counts: [3, 6, 8, 9]
└── SELECT - 500ms (preserved outlier)
├── aggregation.is_preserved_outlier: true
└── cache_hit: false
Example configuration for the component
processors:
spanpruning:
# Attribute patterns for grouping (glob syntax)
group_by_attributes:
- "db.*"
- "http.method"
# Minimum spans required before aggregation (default: 5)
min_spans_to_aggregate: 5
# Parent aggregation depth: 0=none, -1=unlimited (default: 1)
max_parent_depth: 1
# Prefix for aggregation attributes (default: "aggregation.")
aggregation_attribute_prefix: "aggregation."
# Histogram bucket bounds (default: 5ms to 10s)
aggregation_histogram_buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
# Enable attribute loss analysis (default: false)
enable_attribute_loss_analysis: false
# Enable outlier detection (default: false)
enable_outlier_analysis: false
# Outlier analysis settings
outlier_analysis:
method: iqr # or "mad"
iqr_multiplier: 1.5
mad_multiplier: 3.0
min_group_size: 7
correlation_min_occurrence: 0.75
correlation_max_normal_occurrence: 0.25
max_correlated_attributes: 5
preserve_outliers: false
max_preserved_outliers: 2
preserve_only_with_correlation: falseTelemetry data types supported
• Traces (alpha stability)
Code Owners
@portertech, @csmarchbanks, @sherinabr
Sponsor
Additional context
- Pull Request: Add Trace Span Pruning Processor #45617
- Test Coverage: >80% with unit tests, benchmarks, and integration tests
- Documentation: Comprehensive README with configuration reference and examples
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.