name

kibana-anomaly-detection

description

Elastic ML anomaly detection skill — investigation/RCA, score explanation, job operations (create, datafeed, start/stop, results), and troubleshooting (missing docs, memory limits, datafeed health, lifecycle). Operates against Kibana Agent Builder MCP tools (`ad_*`) on `.ml-anomalies-*`, `.ml-config`, `.ml-notifications-*`, `.ml-annotations-*`. Use when answering "what broke?"/"which entity?"/RCA, "why is score high/low?"/renormalization, "datafeed stopped"/"memory limit", or any request to set up or configure an ML anomaly detection job.

metadata

author	version
elastic	0.2.0

compatibility

Kibana 8.x–9.x with Agent Builder and Workflows; Elasticsearch 8.x–9.x with machine learning

Elastic ML Anomaly Detection

Single skill covering all anomaly detection work against Kibana Agent Builder MCP at {KIBANA_URL}/api/agent_builder/mcp. Use the Mode Selector below to pick the right approach for the user's question — modes share the same tool surface and concepts.

Platform

Read path: ES|QL against .ml-anomalies-*, .ml-config, .ml-notifications-*, .ml-annotations-*
Always-available: platform.core.execute_esql (plus additional platform tools for search, index mapping, and documentation — see scripts/agent_builder_constants.json)
ML API spec (if available): .kibana_ai_openapi_spec_elasticsearch — see references/anomaly-detection-openapi-spec-discover.md for discovery pattern.
Run ad_validate_ml_tool_permissions first when tools return empty/misleading results — missing privileges are the most common cause of false negatives. Full permissions matrix: references/permissions-matrix.md.

Mode Selector

User intent	Mode
"What broke?" / RCA / cross-job / blast radius / influencers / log categories	Investigate
"Why score high/low?" / renormalization / model bounds / forecasts	Explain
Missing docs / memory limit / datafeed stopped / CCS / lifecycle / calendars	Troubleshoot
Create a job / configure a datafeed / start analysis / retrieve results	Manage
Security framing (attack chains, MITRE, exfil)	Investigate + references/security-anomaly-expert.md
Observability/SRE framing (degradation, capacity, deployment regression)	Investigate + references/observability-anomaly-expert.md

When a question spans modes: Investigate → Explain → Troubleshoot. Don't blend mode logic — finish one before moving on.

Score Quick Reference

record_score bands: >75 critical · 50–75 warning · 25–50 minor · <25 informational
multi_bucket_impact ≥ 3 → sustained shift (not a transient spike)
initial_record_score >> record_score → renormalization (model saw worse anomalies later)
actual << typical with count/low_count/low_mean → absence/outage, not just low value
Low scores across many jobs > one high score — composite cross-job signal often beats single-detector severity

Full score definitions, renormalization mechanics, and anomaly_score_explanation components: references/score-reference.md.

Core concepts

Treat .ml-anomalies-* as three layers, accessed via result_type:

bucket — bucket-level unusualness per bucket_span. anomaly_score is the aggregate across all detectors.
record — finest-grained rows with actual vs typical, probability, record_score, anomaly_score_explanation.
influencer — entity contributions ranked within a bucket (influencer_score).

Read scores this way:

anomaly_score / record_score = current normalized values (move as the model sees new extremes).
initial_anomaly_score / initial_record_score = immutable snapshots from detection time.
Compare actual to typical; use probability for raw likelihood.
Map entities via partition_field_value / by_field_value / over_field_value.
Read multi_bucket_impact (-5 to +5) to separate single-bucket spikes from sustained trends.

Mode: Investigate — RCA

When: "what broke?", "which entity caused this?", cross-job correlation, blast radius, attack/cascade chains.

Tool chain

Phase	Tools
Discovery	`ad_get_available_metadata`, `ad_get_jobs`, `ad_discover_related_jobs`, `ad_discover_jobs_by_datafeed_index`
Timeline / scope	`ad_query_anomaly_timeline`
Cross-job / entities	`ad_rca_cross_job_entity_match`, `ad_rca_multi_job_entities`, `ad_rca_entity_profile`
Records / influencers	`ad_query_anomaly_records`, `ad_query_influencers`
RCA depth	`ad_rca_detector_fingerprint`, `ad_rca_correlation`, `ad_rca_blast_radius`, `ad_rca_score_reassessment`
Evidence / categories	`ad_get_job_datafeed_config`, `ad_rca_source_evidence`, `ad_get_categories`, `ad_search_log_category_examples`

Protocol

Follow the 14-step sequence in references/protocols/investigation.md. High level: ad_get_available_metadata → pair ad_discover_jobs_by_datafeed_index with ad_discover_related_jobs → ad_query_anomaly_timeline → rank with ad_rca_multi_job_entities (min_job_count=2) → ad_rca_detector_fingerprint → drill with ad_query_anomaly_records + ad_query_influencers (low min_score=25) → profile with ad_rca_entity_profile → order with ad_rca_correlation → confirm with ad_rca_source_evidence. When by_field_name == "mlcategory", compare with ad_get_categories + paired ad_search_log_category_examples (baseline vs. anomaly window).

Finish with a written RCA: root cause entity · affected jobs · temporal progression · fault class (resource/network/application) · severity · recommended actions. Worked example: references/worked-example.md. Full ES|QL templates and parameters: references/investigate-anomaly-esql-tools.md.

Rules

Multi-job entities are prime suspects; single-job entities are usually victims. Use min_job_count=2.
Earliest anomaly timestamp wins — sort ad_rca_correlation by timestamp; first-appearing entity = origin.
multi_bucket_impact ≥ 3 = sustained behavioral shift, weight higher than transient spikes.
Never close an RCA without ad_rca_source_evidence — raw source documents are ground truth.
Use low min_score (25 or lower) for influencer queries — high thresholds miss correlated entities.

Mode: Explain — Score / model behavior

When: "why is my score 30/90?", "score dropped overnight", "what is renormalization?", "why wasn't this detected?".

Score types

Field	Scope	Meaning
`record_score`	Single record	Normalized severity after renormalization.
`initial_record_score`	Single record	Score at detection time. Gap vs `record_score` = renormalization drift.
`anomaly_score`	Bucket	Aggregate severity across all detectors in a bucket.
`influencer_score`	Entity × bucket	How anomalous a specific entity is in that bucket.

`anomaly_score_explanation` components

Component	Effect	What it means
`anomaly_length`	↑ score	More consecutive anomalous buckets
`single_bucket_impact`	↑ score	Lower probability → higher impact
`multi_bucket_impact`	↑ score	Sustained pattern contribution
`anomaly_characteristics_impact`	↑ score	Mean shift vs. variance change
`high_variance_penalty`	↓ score	Noisy data → wide bounds → anomaly less surprising
`incomplete_bucket_penalty`	↓ score	Bucket has less data than expected (ingest lag, sparse data)

Why a score looks wrong

Unexpectedly low: high_variance_penalty, renormalization, <3 weeks training for weekly seasonality, bucket_span too large, wrong detector function (mean vs high_mean), incomplete_bucket_penalty, suppression by custom_rules.
Unexpectedly high: insufficient history (early training over-flags), high-cardinality split (too few points per entity), use_null: true on a sparse field.

Tool chain

Purpose	Tools
Records + explanation	`ad_query_anomaly_records` (exact `job_id_pattern`)
Renormalization drift	`ad_rca_score_reassessment` (`score_drift = initial_record_score - record_score`)
Model bounds (visual)	`ad_get_model_plot` — actual outside `model_lower`/`model_upper` = anomaly
Forecast overlap	`ad_get_forecast_results`
Influencer attribution	`ad_query_influencers`
Config & detector	`ad_get_job_datafeed_config` — `bucket_span`, function, `custom_rules`, `use_null`
Categorization	`ad_get_categories`
Model snapshots	`ad_get_model_snapshots`
Structured diagnostic	`ad_wf_troubleshoot_anomaly_score` (full decision tree)

Decision tree (`ad_wf_troubleshoot_anomaly_score`)

ad_get_jobs — ≥3 weeks data for weekly seasonality?
ad_ts_model_memory_health — memory_status healthy?
ad_ts_delayed_data_annotations — no incomplete buckets?
ad_query_anomaly_records — compare record_score vs initial_record_score.
ad_get_job_datafeed_config — bucket_span, detector function, custom_rules, use_null.
ad_get_model_plot — wide bounds → high_variance_penalty.
ad_rca_score_reassessment — renormalization drift across history.
Explain anomaly_score_explanation factors.

Rules

Always show both initial_record_score and record_score — the gap is the renormalization story.
Explain renormalization before diagnosing config — score drift is the most common "score dropped" cause and needs no config change.
actual << typical with count/low_count is an absence anomaly — distinguish outages from value spikes.
high_variance_penalty and incomplete_bucket_penalty explain most "low score" surprises without remediation.
Weekly seasonality needs ≥3 weeks of training data — flag young jobs as the cause.

For detector function selection details, see references/anomaly-detection-functions.md.

Mode: Troubleshoot — Job ops

When: "missing documents", "datafeed stopped", "hard_limit", "results look wrong", lifecycle changes, calendars, CCS.

Common issues → fast paths

Issue	Fast path	Full decision tree
Missing docs / `query_delay` warning	`ad_ts_delayed_data_annotations` → `ad_ts_bucket_event_gaps` → `ad_ts_ingest_latency_estimate` → `ad_update_datafeed_query_delay`	`ad_wf_troubleshoot_query_delay`
Memory `soft_limit` / `hard_limit`	`ad_ts_model_memory_health` → `ad_wf_ts_field_cardinality` → `ad_estimate_memory_requirement` → `ad_update_model_memory_limit`	`ad_wf_troubleshoot_memory_limit`
Datafeed not running / job state	`ad_get_jobs` (state) → `ad_get_job_messages` → `ad_manage_datafeed`	—
CCS / `remote_cluster:` indices	`ad_ts_ccs_diagnostics`	—
Score sanity check	—	`ad_wf_troubleshoot_anomaly_score`

hard_limit corrupts model state and causes downstream missing-doc false alarms (categorizer silently skips events for unknown categories). Fix memory before fixing query_delay.

Memory concepts

Field	Meaning
`model_bytes`	Current memory used
`peak_model_bytes`	High-water mark since job opened
`model_bytes_memory_limit`	Configured `model_memory_limit`
`memory_status`	`ok` / `soft_limit` (pruning) / `hard_limit` (critical)
`total_by_field_count > 100k`	`by_field` cardinality too high — dominant driver
`total_partition_field_count > 10k`	Partition explosion
`total_category_count > 10k`	Too many distinct log patterns

Prefer ad_estimate_memory_requirement (samples cardinality from source, calls Estimate Model Memory API) over heuristics like peak_model_bytes * 1.3 — the heuristic ignores pure influencer and categorization memory.

Datafeed & timing concepts

query_delay — how far behind real time the datafeed queries. Too small → missing docs; too large → slower alerts. Set to P95 ingest latency + buffer (default 60s–120s).
delayed_data_check_config — how aggressively the datafeed checks for late data.
bucket_span — analysis interval. Align with data granularity and detection window.
frequency — defaults to min(query_delay, bucket_span / 2).

Lifecycle for config changes (memory limit, query_delay)

Stop datafeed: ad_manage_datafeed (action=_stop)
Close job
Update config: ad_update_model_memory_limit, ad_update_datafeed_query_delay, ad_update_delayed_data_check_config
Open job: ad_open_job
Start datafeed: ad_manage_datafeed (action=_start)

Recover a corrupted period without resetting the whole model: ad_revert_model_snapshot.

Tool surface

Category	Tools
Permissions / metadata	`ad_validate_ml_tool_permissions`, `ad_get_available_metadata`, `ad_get_jobs`
Job + datafeed state	`ad_get_job_datafeed_config`, `ad_get_job_messages`, `ad_manage_datafeed`, `ad_preview_datafeed_with_latency`
Timing / missing docs	`ad_ts_delayed_data_annotations`, `ad_ts_bucket_event_gaps`, `ad_ts_ingest_latency_estimate`, `ad_update_datafeed_query_delay`, `ad_update_delayed_data_check_config`, `ad_wf_troubleshoot_query_delay`
Memory	`ad_ts_model_memory_health`, `ad_wf_ts_field_cardinality`, `ad_estimate_memory_requirement`, `ad_update_model_memory_limit`, `ad_wf_troubleshoot_memory_limit`
Model / lifecycle	`ad_get_model_snapshots`, `ad_revert_model_snapshot`, `ad_open_job`, `ad_create_job`
CCS	`ad_ts_ccs_diagnostics`
Calendars	`ad_get_calendar_events`, `ad_create_calendar_event`

Full parameter tables, ES|QL templates, and REST step lists: references/troubleshoot-anomaly-tool-reference.md.

Rules

ad_validate_ml_tool_permissions first — missing privileges produce misleading empty results.
Fix memory before query_delay — hard_limit corrupts state; query_delay fixes on a memory-limited job are wasted.
Stop the datafeed before updating it. Updating a running datafeed is rejected.
Close the job before updating memory limit. Sequence above.
Prefer workflow tools (ad_wf_*) over manually chaining diagnostics for complex decisions.
ad_preview_datafeed_with_latency before starting — confirm the datafeed returns data after config changes.

Mode: Manage — Create / configure jobs

When: "set up a job", "create an ML detector", "monitor X over time", "detect rare/unusual/anomalous values".

4-step workflow

PUT  _ml/anomaly_detectors/<job_id>          # 1. Define job        (ad_create_job)
PUT  _ml/datafeeds/datafeed-<job_id>         # 2. Define datafeed   (ad_create_datafeed)
POST _ml/anomaly_detectors/<job_id>/_open    # 3a. Open job         (ad_open_job)
POST _ml/datafeeds/datafeed-<job_id>/_start  # 3b. Start datafeed   (ad_manage_datafeed action=_start)
GET  _ml/anomaly_detectors/<job_id>/results/records  # 4. Read results

Process

Build configs. Parse the user request into job + datafeed JSON with no null fields.

Apply smart defaults:

Field	Default	Override when
`bucket_span`	`"15m"`	User specifies a different span
`time_field`	`"@timestamp"`	User names a different timestamp field
`index`	`"logs-*"`	User specifies an index or pattern
`datafeed_query`	`{"match_all": {}}`	User mentions filters, processes, or time windows
`influencers`	by/over/partition fields from detectors	User adds extra influencer fields
`job_id`	Generated from user description	User provides an explicit ID
`query_delay`	`"60s"`	P95 ingest latency is higher

Choose detector function from user intent — full table in references/anomaly-detection-functions.md:
- "high CPU" / "unusually large" → high_mean or high_sum
- "rare logins" / "unusual values" → rare (variants below)
- "too many requests" / "spike in count" → high_count
rare variants:
- Infrequent globally → rare by_field_name: X
- Infrequent vs peers → rare by_field_name: X over_field_name: Y
- Infrequent per segment → rare by_field_name: X partition_field_name: Y
- Infrequent per segment vs peers → rare by_field_name: X over_field_name: Y partition_field_name: Z
Validate. platform.core.get_index_mapping on the target index to verify field existence/types → ad_validate_job_spec. If errors, fix and re-validate (max 3 attempts).
Present and confirm. Show the complete job + datafeed bodies formatted as the exact API calls. Ask for approval once. If feedback, incorporate and re-present (up to 3 rounds).
Deploy. After confirmation: ad_create_job → ad_create_datafeed → ad_open_job → ad_manage_datafeed (action=_start). Report final job_id and datafeed_id.

For batch analysis on historical data, pass start and end to the datafeed start call.

Worked examples (rare-username, DNS exfil, large-downloads) with full JSON bodies and datafeed filters: references/job-creation-recipes.md.

Rules

Create job before datafeed. Datafeed references job by ID.
Open job before starting datafeed. Start on a closed job is rejected.
query_delay = P95 ingest latency + buffer (60s–120s safe default).
Forecasts require non-population jobs — over_field_name jobs cannot be forecasted; warn before attempting.
by_field_name vs over_field_name: by compares entity to its own history; over compares to peer group in the same bucket. partition_field_name = fully independent sub-model with its own normalization.
bucket_span matches detection granularity — 15m for high-frequency, 1h for operational metrics, 1d for daily patterns. Larger smooths short spikes; smaller increases noise.

Registration (Kibana Agent Builder)

Requires Node.js 18+. Defaults to elastic/changeme when no credentials are supplied.

cd skills/kibana/kibana-anomaly-detection

# tools → workflows → skills
node scripts/kibana-agent-builder.mjs all register --kibana-url http://localhost:5601

# HTTPS with self-signed cert
node scripts/kibana-agent-builder.mjs all register --kibana-url https://localhost:5601 --insecure

all register runs tools register, then workflows register, then skills register. Kibana allows at most five tool_ids per skill; the script fills them by scanning SKILL.md for tool mentions (in document order), then appends ids from references/kibana/tools/esql/*.json until the cap (workflow-only tools omitted by default). If you run skills register alone, run tools register first so those ids exist.

Workflow tool exclusions and prefixes live in scripts/agent_builder_constants.json.

MCP API key permissions:

Kibana: read_onechat, space_read
Index: read, view_index_metadata on .ml-anomalies-*, .ml-annotations-*, .ml-notifications-*, .ml-config
For source evidence: read on source data indices

Tool inventory

ES|QL tool specs live under references/kibana/tools/esql/*.json; workflow definitions under references/kibana/workflows/*.yaml. Each Mode section above lists the tools it uses. Full surface: references/tools.md (ES|QL) and references/workflow-tools.md (workflows).

Key system indices

Index	Relevant content
`.ml-anomalies-*`	`record`, `bucket`, `influencer`, `model_plot`, `model_forecast`, `model_snapshot`, `category_definition`, `model_size_stats`
`.ml-config`	job/datafeed documents (visible even for never-run jobs)
`.ml-annotations-*`	delayed data (`event == "delayed_data"`)
`.ml-notifications-*`	job messages (`level`: info/warning/error)

Examples

RCA: "Something caused a spike in our error rate at 2pm — what broke?" → Investigate → ad_get_available_metadata → ad_query_anomaly_timeline → ad_rca_cross_job_entity_match → ad_rca_multi_job_entities → RCA report.

Score drop: "My anomaly score went from 90 to 55 — did the model change?" → Explain → ad_rca_score_reassessment for drift → explain renormalization if score_drift is large.

Memory limit: "Job status shows hard_limit and results look wrong." → Troubleshoot → ad_ts_model_memory_health → ad_wf_ts_field_cardinality → ad_estimate_memory_requirement → ad_update_model_memory_limit (lifecycle: stop datafeed → close → update → open → start).

New job: "Detect unusual error rates per host on nginx access logs." → Manage → high_count detector with by_field_name: "host.keyword" → validate → present → deploy.

Multi-mode: "We had an incident last night, scores were high but now low — is the job healthy?" → Investigate the incident → Explain the score drift → Troubleshoot if hard_limit or delayed data is suspected.

Guidelines

Pick a mode first. Don't blend RCA logic with score-explanation logic in one response.
ad_validate_ml_tool_permissions first on empty results — privileges are the most common false-negative cause.
Score bands are absolute thresholds: >75 critical, 50–75 warning, 25–50 minor, <25 informational.
Multi-job entities are prime suspects. Use min_job_count=2 in ad_rca_multi_job_entities.
Show initial_record_score alongside record_score — the gap tells the renormalization story.
Fix memory before query_delay. hard_limit invalidates downstream diagnostics.
Stop datafeed → close job → update config → open job → start datafeed for any config change to memory or query delay.
Confirm RCAs with ad_rca_source_evidence. Raw source documents are ground truth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic ML Anomaly Detection

Platform

Mode Selector

Score Quick Reference

Core concepts

Mode: Investigate — RCA

Tool chain

Protocol

Rules

Mode: Explain — Score / model behavior

Score types

`anomaly_score_explanation` components

Why a score looks wrong

Tool chain

Decision tree (`ad_wf_troubleshoot_anomaly_score`)

Rules

Mode: Troubleshoot — Job ops

Common issues → fast paths

Memory concepts

Datafeed & timing concepts

Lifecycle for config changes (memory limit, query_delay)

Tool surface

Rules

Mode: Manage — Create / configure jobs

4-step workflow

Process

Rules

Registration (Kibana Agent Builder)

Tool inventory

Key system indices

Examples

Guidelines

FilesExpand file tree

SKILL.md

Latest commit

History

SKILL.md

File metadata and controls

Elastic ML Anomaly Detection

Platform

Mode Selector

Score Quick Reference

Core concepts

Mode: Investigate — RCA

Tool chain

Protocol

Rules

Mode: Explain — Score / model behavior

Score types

anomaly_score_explanation components

Why a score looks wrong

Tool chain

Decision tree (ad_wf_troubleshoot_anomaly_score)

Rules

Mode: Troubleshoot — Job ops

Common issues → fast paths

Memory concepts

Datafeed & timing concepts

Lifecycle for config changes (memory limit, query_delay)

Tool surface

Rules

Mode: Manage — Create / configure jobs

4-step workflow

Process

Rules

Registration (Kibana Agent Builder)

Tool inventory

Key system indices

Examples

Guidelines

`anomaly_score_explanation` components

Decision tree (`ad_wf_troubleshoot_anomaly_score`)