Skip to content

New component: processor/drain — log template annotation via the Drain algorithm #47235

@MikeGoldsmith

Description

@MikeGoldsmith

The problem

Large-scale deployments routinely ingest millions of log records per minute where a small number of structural patterns account for the majority of volume. Without a way to group logs by their underlying pattern, operators cannot:

  • Identify which log classes are generating the most volume
  • Write reliable filter rules that survive log message variations (e.g. different IP addresses, user names, request IDs)
  • Build cardinality-safe dashboards — grouping by raw log body is impractical

Existing approaches require operators to write and maintain regular expressions by hand, which doesn't scale and misses patterns they haven't anticipated.

Proposed solution

A new processor/drain that applies the Drain log clustering algorithm to log records as they pass through the pipeline. Drain builds a parse tree from log token structure and automatically derives template strings (e.g. "user <*> logged in from <*>") by replacing variable tokens with wildcards as similar lines accumulate.

The processor annotates each record with the following attribute:

Attribute Type Example
log.record.template string "user <*> logged in from <*>"

This aligns with the proposed OTel semantic convention in open-telemetry/semantic-conventions#1283 and #2064.

The processor annotates only — it does not filter. Downstream processors (e.g. filter) act on the attributes, keeping concerns separated and the processor composable.

Key features

  • Configurable Drain parse tree parameters (depth, similarity threshold, max clusters with LRU eviction)
  • Pre-seeding via known template strings or example log lines for stable templates across restarts
  • passthrough warmup mode (default): annotates immediately from the first record
  • buffer warmup mode: holds records until the tree has stabilized, then flushes with abstracted templates applied
  • Optional body_field for pipelines where the log body is a structured map and the message field cannot be promoted to a plain string body upstream — pipelines that do have that control should use a move operator instead
  • Internal telemetry: processor_drain_clusters_active gauge, processor_drain_log_records_annotated and processor_drain_log_records_unannotated counters

Example

processors:
  drain:
    log_cluster_depth: 4
    sim_threshold: 0.4
    seed_templates:
      - "user <*> logged in from <*>"
      - "connected to <*>"
    warmup_mode: buffer
    warmup_min_clusters: 20
    warmup_buffer_max_logs: 5000

  filter/drop_noisy:
    error_mode: ignore
    logs:
      log_record:
        - attributes["log.record.template"] == "heartbeat ping <*>"

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [drain, filter/drop_noisy]
      exporters: [otlp]

Alternatives considered

  • transform processor + OTTL: can match patterns but requires operators to enumerate every pattern manually as regex rules. Doesn't discover new patterns automatically.
  • attributes processor: attribute renaming only; no clustering capability.

Intentional scope limitations (deferred)

  • body_field supports only a single top-level key. Full OTTL path expressions are a natural follow-on but are out of scope for the initial implementation.
  • Snapshot persistence (save/restore the Drain tree across restarts) would eliminate the need for seeding. The internal drain package is designed to support this, but the plumbing into the collector lifecycle is deferred.
  • Multi-instance synchronization for consistent templates across horizontally scaled deployments.

Telemetry data types

Logs only.

Code owners

@MikeGoldsmith

Metadata

Metadata

Assignees

No one assigned

    Labels

    Sponsor NeededNew component seeking sponsor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions