Skip to content

[processor/drain] add new drain processor for log template annotation#47236

Open
MikeGoldsmith wants to merge 4 commits intoopen-telemetry:mainfrom
MikeGoldsmith:mike/drain-processor
Open

[processor/drain] add new drain processor for log template annotation#47236
MikeGoldsmith wants to merge 4 commits intoopen-telemetry:mainfrom
MikeGoldsmith:mike/drain-processor

Conversation

@MikeGoldsmith
Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith commented Mar 30, 2026

Closes #47235

Overview

Adds processor/drain, which applies the Drain log clustering algorithm to annotate log records with a derived template string.

Attribute Type Example
log.record.template string "user <*> logged in from <*>"

The processor annotates only — it does not filter. Use the filter processor downstream to act on the template attribute.

Key features

  • Configurable parse tree parameters: tree_depth, merge_threshold, max_node_children, max_clusters (LRU eviction)
  • Optional body_field extraction for structured (map) log bodies
  • Pre-seeding via seed_templates or seed_logs for stable templates across restarts
  • Optional warmup_min_clusters: trains from the first record but suppresses annotation until N distinct clusters are observed — records pass through immediately, no buffering or added latency
  • Internal telemetry: processor_drain_clusters_active gauge, processor_drain_log_records_annotated and processor_drain_log_records_unannotated counters

Dependencies

This processor introduces one new third-party dependency:

Module Version Licence Purpose
github.com/Jaeyo/go-drain3 v0.1.2 MIT Drain algorithm implementation

go-drain3 is a Go port of the Python Drain3 reference library. It was chosen for its complete implementation of the algorithm including JSON marshal/unmarshal hooks, which are the foundation for future snapshot persistence support.

Test plan

  • Unit tests covering annotation, seeding, warmup suppression, custom attribute name, body field extraction, empty body skipping, multi-resource batches
  • Race detector clean (go test -race ./...)
  • Linter clean (golangci-lint run ./...)
  • Generated files produced by mdatagen

Changes from initial proposal (post-SIG 2026-04-01)

  • Library: corrected to jaeyo/go-drain3 (was incorrectly listed as faceair/drain)
  • Buffer warmup mode removed: held records in memory until tree stabilised — incompatible with collector latency requirements
  • warmup_min_clusters introduced as a zero-latency alternative: suppresses annotation during warmup without buffering
  • Config renames: log_cluster_depthtree_depth, sim_thresholdmerge_threshold, max_childrenmax_node_children — algorithm names referenced in docs
  • Sponsor: @atoulme

…n-telemetry#47235]

Adds a new `drain` processor that applies the Drain log clustering
algorithm to log records as they pass through the pipeline. For each
record it derives a template string (e.g. "user <*> logged in from <*>")
and attaches it as the `log.record.template` attribute.

Key features:
- Configurable Drain parse tree parameters (depth, sim_threshold,
  max_children, max_clusters, extra_delimiters)
- Seeding via seed_templates or seed_logs for stable templates at startup
- Two warmup modes: passthrough (annotate immediately) and buffer (hold
  records until the tree has stabilized, then flush fully annotated)
- Optional body_field extraction for structured map log bodies
- Internal telemetry: clusters_active gauge, log_records_annotated and
  log_records_unannotated counters

Assisted-by: Claude Sonnet 4.6
@MikeGoldsmith MikeGoldsmith changed the title feat(processor): add drain processor for log template annotation [processor/drain] add log template annotation via the Drain algorithm Mar 31, 2026
@MikeGoldsmith MikeGoldsmith changed the title [processor/drain] add log template annotation via the Drain algorithm [processor/drain] add new drain processor for log template annotation Mar 31, 2026
go mod tidy promotes otel/metric, otel/sdk/metric, and otel/trace from
indirect to direct following the addition of the TelemetryBuilder.

Assisted-by: Claude Sonnet 4.6
@MikeGoldsmith MikeGoldsmith marked this pull request as ready for review March 31, 2026 13:42
@MikeGoldsmith MikeGoldsmith requested a review from a team as a code owner March 31, 2026 13:42
@MikeGoldsmith MikeGoldsmith requested a review from edmocosta March 31, 2026 13:42
@iblancasa iblancasa added the Sponsor Needed New component seeking sponsor label Mar 31, 2026
@songy23 songy23 requested a review from atoulme April 1, 2026 16:58
@songy23 songy23 added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Apr 1, 2026
processor/deltatocumulativeprocessor/ @open-telemetry/collector-contrib-approvers @RichieSams
processor/deltatorateprocessor/ @open-telemetry/collector-contrib-approvers @Aneurysm9
processor/dnslookupprocessor/ @open-telemetry/collector-contrib-approvers @andrzej-stencel @kaisecheng @edmocosta
processor/drainprocessor/ @open-telemetry/collector-contrib-approvers @MikeGoldsmith
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our new component policy, we need at least three code owners for your component to be accepted. If @atoulme is willing to be the second one, we'd still need to find a third one to meet that requirement.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edmocosta - I'll keep looking for other contributors who would be interested in supporting this component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Accepted Component New component has been sponsored cmd/otelcontribcol otelcontribcol command reports/distributions/contrib.yaml

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New component: processor/drain — log template annotation via the Drain algorithm

5 participants