Skip to content

Event Deltas: Heatmap overflow bucket indicators for outlier detection #1914

@alex-fedotyev

Description

@alex-fedotyev

Problem

The heatmap uses quantile(0.01) for the lower bound and actual max() for the upper bound. Values below the lower quantile land in an underflow bucket (bucket 0).

Previously, a quantile(0.99) upper bound was also used, but this hid latency spikes above the 99th percentile — the exact anomalies (timeouts, slow queries) that users need a heatmap to detect. The upper bound was changed to actual max() since log scale handles wide ranges naturally.

However, using actual max() means a single extreme outlier (e.g., one 60s timeout when p99 is 500ms) can stretch the axis. Overflow-bucket indicators would let us use a tighter quantile range for the axis without hiding data — users would see a visual signal that data exists beyond the visible range.

Current overflow behavior

  • Bucket 0: all values ≤ effectiveMin (fast failures, Duration=0)
  • Bucket N+1: all values ≥ max (only the exact max value due to widthBucket semantics)

These overflow buckets are rendered as normal cells, so users can't distinguish:

  • A timeout at 10s vs 60s (both in the top overflow bucket)
  • A fast failure at 0ms vs 0.5ms (both in the bottom overflow bucket)

Why this would improve UX

With overflow-bucket indicators, we could re-introduce quantile-based range clamping (e.g., p0.1–p99.9) for the axis to keep the chart focused on the most relevant range, while still giving users a clear signal that outlier data exists beyond the visible boundaries. This is the "smart lumping" approach — the axis stays tight and readable, but spikes aren't silently hidden.

Use cases

  • Fast failures: Auth rejected, validation errors, connection refused — duration ~0ms, clustered at the bottom. A spike in these indicates an error wave.
  • Slow timeouts: Gateway timeouts, stuck queries — duration 10-60s, clustered at the top. These are often the most critical incidents to spot.

Proposal

Visually distinguish overflow buckets from regular buckets so users know data is being lumped:

  1. Visual indicator: Render overflow rows with a subtle hatched/striped pattern or different border to signal "this bucket contains clamped values"
  2. Tooltip context: When hovering an overflow bucket, show the actual min/max range of values in that bucket (e.g., "0ms – 0.01ms, 523 spans" or "30s – 120s, 12 spans")
  3. Selection accuracy: When selecting an overflow bucket, use the actual data range (not the bucket boundary) for the downstream filter

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions