Canonical definitions for all score and impact fields in Elastic ML anomaly detection.
| Field | Scope | Range | Description |
|---|---|---|---|
record_score |
Single anomaly record | 0–100 | Current normalized severity. May change over time as the model sees more extreme data. |
initial_record_score |
Single anomaly record | 0–100 | Score at detection time — never changes. Use for alerting on fresh anomalies. |
anomaly_score |
Bucket (time window) | 0–100 | Aggregate severity across all detectors in a bucket. |
initial_anomaly_score |
Bucket | 0–100 | Bucket score at detection time — never changes. |
influencer_score |
Entity × bucket | 0–100 | How anomalous a specific entity (host, user, service) is within that bucket. |
| Band | Range | Interpretation |
|---|---|---|
| Critical | > 75 | High-confidence anomaly; warrants immediate investigation |
| Warning | 50–75 | Notable deviation; triage and correlate with other signals |
| Minor | 25–50 | Potentially interesting; aggregate with cross-job signals |
| Informational | < 25 | Weak signal; useful for context, not standalone action |
Cross-job composite signal: Low scores (25–50) across many jobs simultaneously are often more significant than a single high score. Five jobs each scoring 30 = composite signal 150, pointing to a systemic root cause.
Scale from -5 to +5 indicating whether the anomaly spans multiple consecutive time buckets.
| Value | Meaning |
|---|---|
| 0 | One-off event, no sustained pattern |
| 1–2 | Mild persistence across a few buckets |
| ≥ 3 | Genuine behavioral shift — not a transient spike |
| Negative | Anomaly is suppressed by surrounding normal behavior |
Values ≥ 3 strongly suggest a real system change (e.g., a resource exhaustion event that persists) rather than a momentary blip.
Elasticsearch continuously renormalizes scores relative to the most extreme anomaly ever seen by the job. A score of 90 today may become 60 if a more extreme event appears later — by design, so the "worst ever" event always scores near 100.
When to use each:
| Use case | Field |
|---|---|
| Alerting on newly detected anomalies | initial_record_score — captures severity at detection time |
| Ranking historical anomalies by current importance | record_score — reflects how bad this was relative to all history |
| Detecting renormalization (model calibrated away an anomaly) | Compare: if initial_record_score >> record_score, the model saw worse events later |
Quantify drift: score_drift = initial_record_score - record_score
- Large positive drift = renormalized away (model calibrated)
- Small drift = score is stable and genuine
When available, this field explains the factors that contributed to the final score.
| Component | Effect on score | What it means |
|---|---|---|
anomaly_length |
↑ increases | More consecutive anomalous buckets — sustained deviation |
single_bucket_impact |
↑ increases | Lower statistical probability → more surprising → higher impact |
multi_bucket_impact |
↑ increases | Contribution from sustained pattern across multiple buckets |
anomaly_characteristics_impact |
↑ increases | Mean shift (value moved) vs. variance change (volatility increased) |
high_variance_penalty |
↓ decreases | Historically noisy data; wide confidence bounds absorb the spike |
incomplete_bucket_penalty |
↓ decreases | Bucket has less data than expected (ingest lag, sparse events) |
When actual << typical with count, low_count, low_mean, or low_sum functions, a low or zero value indicates a
real-world absence — not just a numerically low observation:
- Zero
countwhen traffic is normally constant → pipeline stopped, service unavailable low_mean(response_time)→ requests completing too fast (cache hit storm, bypassed processing)- Very low
sum(bytes_sent)→ network partition or data source failure
Key insight: A record_score of 80 with actual = 0 and typical = 5000 is an outage signal, not just a low
number.
high_variance_penalty— Metric is historically noisy; wide model bounds absorb the spike.- Renormalization — A more extreme anomaly appeared later, pushing this score down.
- Insufficient training — Model needs ≥ 3 weeks for weekly seasonality, ≥ 2 full cycles for any period.
bucket_spantoo large — Long span smooths short-duration spikes; use smaller span for high-frequency detection.- Detector function mismatch —
meanvshigh_mean,countvshigh_count. Wrong function = missed direction. incomplete_bucket_penalty— Ingest latency or sparse events reduced bucket data volume.custom_rules— A detector filter may be suppressing or conditioning the anomaly.
- Insufficient training history — Early training: moderate deviations flag as extreme.
- High-cardinality split — Too few data points per entity per bucket → unreliable probabilities.
use_null: true— Missing entities produce "null" anomalies that may not be operationally meaningful.
- anomaly-detection-functions.md — Function selection guide
- protocols/investigation.md — 14-step investigation workflow
- worked-example.md — End-to-end investigation walkthrough