Skip to content

Component donation: Span Pruning Processor #45654

@portertech

Description

@portertech

Building and distributing outside this repository

  • I understand I can use and distribute my component outside of this repository.

Existing component implementation

https://github.com/grafana/opentelemetry-collector-extras

Components covering similar use cases

Component Relationship
tailsamplingprocessor Reduces trace volume by sampling complete traces based on policies. Span pruning is complementary—it reduces volume within traces by aggregating repetitive spans rather than dropping entire traces.
filterprocessor Drops entire spans/metrics/logs matching criteria. Span pruning preserves information by aggregating rather than dropping.
groupbytraceprocessor Groups spans by trace ID (often used upstream of tail sampling). Span pruning operates on complete traces and would typically be placed after groupbytrace.
transformprocessor Modifies telemetry using OTTL. Could theoretically implement some aggregation logic, but span pruning provides purpose-built aggregation with statistics.

No direct equivalent exists for aggregating similar leaf spans within a trace while preserving statistical information.

The purpose and use-cases of the component

Problem Statement

Modern distributed applications often generate traces with many repetitive leaf spans—N+1 database queries, batch HTTP calls, fan-out operations. These patterns create:

  • High storage costs from redundant span data
  • Noisy trace views that obscure the actual request flow
  • Cardinality pressure on backends

Solution

The spanpruningprocessor identifies similar leaf spans within a trace and replaces each group with a single summary span containing aggregated statistics. This reduces trace volume while preserving meaningful operational information.

Key Features

Feature Description
Leaf span aggregation Groups spans by name, kind, status, parent, and configurable attributes (glob patterns supported)
Recursive parent aggregation When all children of a parent are aggregated, the parent can also be aggregated (configurable depth)
Statistical summaries Summary spans include count, min/max/avg/total duration, and histogram buckets
Outlier detection IQR or MAD-based detection identifies slow spans; optionally correlates outliers with attribute values
Outlier preservation Keeps outlier spans as individuals for debugging while aggregating normal spans
Attribute loss tracking Optional metrics and annotations showing what attribute diversity is lost during aggregation

Use Cases

  1. Database query optimization - Aggregate N+1 query patterns into a single summary showing query count and latency distribution
  2. Batch operations - Consolidate fan-out calls (parallel HTTP requests, message publishes) into representative summaries
  3. Cost reduction - Reduce trace storage costs for repetitive workloads while retaining operational insight
  4. Debugging support - Preserve outlier spans with correlated attributes to identify root causes of latency issues

Example

Before (10 similar SELECT spans):

handler
├── SELECT - 5ms
├── SELECT - 6ms
├── SELECT - 7ms
├── SELECT - 500ms (outlier, cache_hit=false)
└── ... (6 more)

After (with outlier preservation):

handler
├── SELECT (summary, span_count=9)
│   ├── aggregation.duration_avg_ns: 8000000
│   ├── aggregation.duration_median_ns: 7000000
│   ├── aggregation.outlier_correlated_attributes: "cache_hit=false(100%/0%)"
│   └── aggregation.histogram_bucket_counts: [3, 6, 8, 9]
└── SELECT - 500ms (preserved outlier)
    ├── aggregation.is_preserved_outlier: true
    └── cache_hit: false

Example configuration for the component

processors:
  spanpruning:
    # Attribute patterns for grouping (glob syntax)
    group_by_attributes:
      - "db.*"
      - "http.method"
    
    # Minimum spans required before aggregation (default: 5)
    min_spans_to_aggregate: 5
    
    # Parent aggregation depth: 0=none, -1=unlimited (default: 1)
    max_parent_depth: 1
    
    # Prefix for aggregation attributes (default: "aggregation.")
    aggregation_attribute_prefix: "aggregation."
    
    # Histogram bucket bounds (default: 5ms to 10s)
    aggregation_histogram_buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
    
    # Enable attribute loss analysis (default: false)
    enable_attribute_loss_analysis: false
    
    # Enable outlier detection (default: false)
    enable_outlier_analysis: false
    
    # Outlier analysis settings
    outlier_analysis:
      method: iqr                    # or "mad"
      iqr_multiplier: 1.5
      mad_multiplier: 3.0
      min_group_size: 7
      correlation_min_occurrence: 0.75
      correlation_max_normal_occurrence: 0.25
      max_correlated_attributes: 5
      preserve_outliers: false
      max_preserved_outliers: 2
      preserve_only_with_correlation: false

Telemetry data types supported

• Traces (alpha stability)

Code Owners

@portertech, @csmarchbanks, @sherinabr

Sponsor

@jmacd

Additional context

  • Pull Request: Add Trace Span Pruning Processor #45617
  • Test Coverage: >80% with unit tests, benchmarks, and integration tests
  • Documentation: Comprehensive README with configuration reference and examples

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions