-
Notifications
You must be signed in to change notification settings - Fork 379
Description
Problem
The heatmap uses quantile(0.01) for the lower bound and actual max() for the upper bound. Values below the lower quantile land in an underflow bucket (bucket 0).
Previously, a quantile(0.99) upper bound was also used, but this hid latency spikes above the 99th percentile — the exact anomalies (timeouts, slow queries) that users need a heatmap to detect. The upper bound was changed to actual max() since log scale handles wide ranges naturally.
However, using actual max() means a single extreme outlier (e.g., one 60s timeout when p99 is 500ms) can stretch the axis. Overflow-bucket indicators would let us use a tighter quantile range for the axis without hiding data — users would see a visual signal that data exists beyond the visible range.
Current overflow behavior
- Bucket 0: all values ≤ effectiveMin (fast failures, Duration=0)
- Bucket N+1: all values ≥ max (only the exact max value due to widthBucket semantics)
These overflow buckets are rendered as normal cells, so users can't distinguish:
- A timeout at 10s vs 60s (both in the top overflow bucket)
- A fast failure at 0ms vs 0.5ms (both in the bottom overflow bucket)
Why this would improve UX
With overflow-bucket indicators, we could re-introduce quantile-based range clamping (e.g., p0.1–p99.9) for the axis to keep the chart focused on the most relevant range, while still giving users a clear signal that outlier data exists beyond the visible boundaries. This is the "smart lumping" approach — the axis stays tight and readable, but spikes aren't silently hidden.
Use cases
- Fast failures: Auth rejected, validation errors, connection refused — duration ~0ms, clustered at the bottom. A spike in these indicates an error wave.
- Slow timeouts: Gateway timeouts, stuck queries — duration 10-60s, clustered at the top. These are often the most critical incidents to spot.
Proposal
Visually distinguish overflow buckets from regular buckets so users know data is being lumped:
- Visual indicator: Render overflow rows with a subtle hatched/striped pattern or different border to signal "this bucket contains clamped values"
- Tooltip context: When hovering an overflow bucket, show the actual min/max range of values in that bucket (e.g., "0ms – 0.01ms, 523 spans" or "30s – 120s, 12 spans")
- Selection accuracy: When selecting an overflow bucket, use the actual data range (not the bucket boundary) for the downstream filter
Related
- Event Deltas: Heatmap symlog scale for zero-inclusive metrics #1910 — symlog scale (alternative approach to zero-handling)
- Event Deltas: Heatmap visualization overhaul - log scale, color refinement, legend #1909 — heatmap visualization overhaul (parent)
- Event Deltas: Heatmap hover tooltip with percentile context #1911 — hover tooltip with percentile context