| name | kibana-anomaly-detection | ||||
|---|---|---|---|---|---|
| description | Elastic ML anomaly detection skill — investigation/RCA, score explanation, job operations (create, datafeed, start/stop, results), and troubleshooting (missing docs, memory limits, datafeed health, lifecycle). Operates against Kibana Agent Builder MCP tools (`ad_*`) on `.ml-anomalies-*`, `.ml-config`, `.ml-notifications-*`, `.ml-annotations-*`. Use when answering "what broke?"/"which entity?"/RCA, "why is score high/low?"/renormalization, "datafeed stopped"/"memory limit", or any request to set up or configure an ML anomaly detection job. | ||||
| metadata |
|
||||
| compatibility | Kibana 8.x–9.x with Agent Builder and Workflows; Elasticsearch 8.x–9.x with machine learning |
Single skill covering all anomaly detection work against Kibana Agent Builder MCP at
{KIBANA_URL}/api/agent_builder/mcp. Use the Mode Selector below to pick the right approach for the user's question
— modes share the same tool surface and concepts.
- Read path: ES|QL against
.ml-anomalies-*,.ml-config,.ml-notifications-*,.ml-annotations-* - Always-available:
platform.core.execute_esql(plus additional platform tools for search, index mapping, and documentation — seescripts/agent_builder_constants.json) - ML API spec (if available):
.kibana_ai_openapi_spec_elasticsearch— see references/anomaly-detection-openapi-spec-discover.md for discovery pattern. - Run
ad_validate_ml_tool_permissionsfirst when tools return empty/misleading results — missing privileges are the most common cause of false negatives. Full permissions matrix: references/permissions-matrix.md.
| User intent | Mode |
|---|---|
| "What broke?" / RCA / cross-job / blast radius / influencers / log categories | Investigate |
| "Why score high/low?" / renormalization / model bounds / forecasts | Explain |
| Missing docs / memory limit / datafeed stopped / CCS / lifecycle / calendars | Troubleshoot |
| Create a job / configure a datafeed / start analysis / retrieve results | Manage |
| Security framing (attack chains, MITRE, exfil) | Investigate + references/security-anomaly-expert.md |
| Observability/SRE framing (degradation, capacity, deployment regression) | Investigate + references/observability-anomaly-expert.md |
When a question spans modes: Investigate → Explain → Troubleshoot. Don't blend mode logic — finish one before moving on.
record_scorebands: >75 critical · 50–75 warning · 25–50 minor · <25 informationalmulti_bucket_impact ≥ 3→ sustained shift (not a transient spike)initial_record_score >> record_score→ renormalization (model saw worse anomalies later)actual << typicalwithcount/low_count/low_mean→ absence/outage, not just low value- Low scores across many jobs > one high score — composite cross-job signal often beats single-detector severity
Full score definitions, renormalization mechanics, and
anomaly_score_explanationcomponents: references/score-reference.md.
Treat .ml-anomalies-* as three layers, accessed via result_type:
bucket— bucket-level unusualness perbucket_span.anomaly_scoreis the aggregate across all detectors.record— finest-grained rows withactualvstypical,probability,record_score,anomaly_score_explanation.influencer— entity contributions ranked within a bucket (influencer_score).
Read scores this way:
anomaly_score/record_score= current normalized values (move as the model sees new extremes).initial_anomaly_score/initial_record_score= immutable snapshots from detection time.- Compare
actualtotypical; useprobabilityfor raw likelihood. - Map entities via
partition_field_value/by_field_value/over_field_value. - Read
multi_bucket_impact(-5 to +5) to separate single-bucket spikes from sustained trends.
When: "what broke?", "which entity caused this?", cross-job correlation, blast radius, attack/cascade chains.
| Phase | Tools |
|---|---|
| Discovery | ad_get_available_metadata, ad_get_jobs, ad_discover_related_jobs, ad_discover_jobs_by_datafeed_index |
| Timeline / scope | ad_query_anomaly_timeline |
| Cross-job / entities | ad_rca_cross_job_entity_match, ad_rca_multi_job_entities, ad_rca_entity_profile |
| Records / influencers | ad_query_anomaly_records, ad_query_influencers |
| RCA depth | ad_rca_detector_fingerprint, ad_rca_correlation, ad_rca_blast_radius, ad_rca_score_reassessment |
| Evidence / categories | ad_get_job_datafeed_config, ad_rca_source_evidence, ad_get_categories, ad_search_log_category_examples |
Follow the 14-step sequence in references/protocols/investigation.md. High
level: ad_get_available_metadata → pair ad_discover_jobs_by_datafeed_index with ad_discover_related_jobs →
ad_query_anomaly_timeline → rank with ad_rca_multi_job_entities (min_job_count=2) → ad_rca_detector_fingerprint
→ drill with ad_query_anomaly_records + ad_query_influencers (low min_score=25) → profile with
ad_rca_entity_profile → order with ad_rca_correlation → confirm with ad_rca_source_evidence. When
by_field_name == "mlcategory", compare with ad_get_categories + paired ad_search_log_category_examples (baseline
vs. anomaly window).
Finish with a written RCA: root cause entity · affected jobs · temporal progression · fault class (resource/network/application) · severity · recommended actions. Worked example: references/worked-example.md. Full ES|QL templates and parameters: references/investigate-anomaly-esql-tools.md.
- Multi-job entities are prime suspects; single-job entities are usually victims. Use
min_job_count=2. - Earliest anomaly timestamp wins — sort
ad_rca_correlationby timestamp; first-appearing entity = origin. multi_bucket_impact ≥ 3= sustained behavioral shift, weight higher than transient spikes.- Never close an RCA without
ad_rca_source_evidence— raw source documents are ground truth. - Use low
min_score(25 or lower) for influencer queries — high thresholds miss correlated entities.
When: "why is my score 30/90?", "score dropped overnight", "what is renormalization?", "why wasn't this detected?".
| Field | Scope | Meaning |
|---|---|---|
record_score |
Single record | Normalized severity after renormalization. |
initial_record_score |
Single record | Score at detection time. Gap vs record_score = renormalization drift. |
anomaly_score |
Bucket | Aggregate severity across all detectors in a bucket. |
influencer_score |
Entity × bucket | How anomalous a specific entity is in that bucket. |
| Component | Effect | What it means |
|---|---|---|
anomaly_length |
↑ score | More consecutive anomalous buckets |
single_bucket_impact |
↑ score | Lower probability → higher impact |
multi_bucket_impact |
↑ score | Sustained pattern contribution |
anomaly_characteristics_impact |
↑ score | Mean shift vs. variance change |
high_variance_penalty |
↓ score | Noisy data → wide bounds → anomaly less surprising |
incomplete_bucket_penalty |
↓ score | Bucket has less data than expected (ingest lag, sparse data) |
- Unexpectedly low:
high_variance_penalty, renormalization, <3 weeks training for weekly seasonality,bucket_spantoo large, wrong detector function (meanvshigh_mean),incomplete_bucket_penalty, suppression bycustom_rules. - Unexpectedly high: insufficient history (early training over-flags), high-cardinality split (too few points per
entity),
use_null: trueon a sparse field.
| Purpose | Tools |
|---|---|
| Records + explanation | ad_query_anomaly_records (exact job_id_pattern) |
| Renormalization drift | ad_rca_score_reassessment (score_drift = initial_record_score - record_score) |
| Model bounds (visual) | ad_get_model_plot — actual outside model_lower/model_upper = anomaly |
| Forecast overlap | ad_get_forecast_results |
| Influencer attribution | ad_query_influencers |
| Config & detector | ad_get_job_datafeed_config — bucket_span, function, custom_rules, use_null |
| Categorization | ad_get_categories |
| Model snapshots | ad_get_model_snapshots |
| Structured diagnostic | ad_wf_troubleshoot_anomaly_score (full decision tree) |
ad_get_jobs— ≥3 weeks data for weekly seasonality?ad_ts_model_memory_health—memory_statushealthy?ad_ts_delayed_data_annotations— no incomplete buckets?ad_query_anomaly_records— comparerecord_scorevsinitial_record_score.ad_get_job_datafeed_config—bucket_span, detector function,custom_rules,use_null.ad_get_model_plot— wide bounds →high_variance_penalty.ad_rca_score_reassessment— renormalization drift across history.- Explain
anomaly_score_explanationfactors.
- Always show both
initial_record_scoreandrecord_score— the gap is the renormalization story. - Explain renormalization before diagnosing config — score drift is the most common "score dropped" cause and needs no config change.
actual << typicalwithcount/low_countis an absence anomaly — distinguish outages from value spikes.high_variance_penaltyandincomplete_bucket_penaltyexplain most "low score" surprises without remediation.- Weekly seasonality needs ≥3 weeks of training data — flag young jobs as the cause.
For detector function selection details, see references/anomaly-detection-functions.md.
When: "missing documents", "datafeed stopped", "hard_limit", "results look wrong", lifecycle changes, calendars, CCS.
| Issue | Fast path | Full decision tree |
|---|---|---|
Missing docs / query_delay warning |
ad_ts_delayed_data_annotations → ad_ts_bucket_event_gaps → ad_ts_ingest_latency_estimate → ad_update_datafeed_query_delay |
ad_wf_troubleshoot_query_delay |
Memory soft_limit / hard_limit |
ad_ts_model_memory_health → ad_wf_ts_field_cardinality → ad_estimate_memory_requirement → ad_update_model_memory_limit |
ad_wf_troubleshoot_memory_limit |
| Datafeed not running / job state | ad_get_jobs (state) → ad_get_job_messages → ad_manage_datafeed |
— |
CCS / remote_cluster: indices |
ad_ts_ccs_diagnostics |
— |
| Score sanity check | — | ad_wf_troubleshoot_anomaly_score |
hard_limitcorrupts model state and causes downstream missing-doc false alarms (categorizer silently skips events for unknown categories). Fix memory before fixingquery_delay.
| Field | Meaning |
|---|---|
model_bytes |
Current memory used |
peak_model_bytes |
High-water mark since job opened |
model_bytes_memory_limit |
Configured model_memory_limit |
memory_status |
ok / soft_limit (pruning) / hard_limit (critical) |
total_by_field_count > 100k |
by_field cardinality too high — dominant driver |
total_partition_field_count > 10k |
Partition explosion |
total_category_count > 10k |
Too many distinct log patterns |
Prefer ad_estimate_memory_requirement (samples cardinality from source, calls Estimate Model Memory API) over
heuristics like peak_model_bytes * 1.3 — the heuristic ignores pure influencer and categorization memory.
query_delay— how far behind real time the datafeed queries. Too small → missing docs; too large → slower alerts. Set to P95 ingest latency + buffer (default60s–120s).delayed_data_check_config— how aggressively the datafeed checks for late data.bucket_span— analysis interval. Align with data granularity and detection window.frequency— defaults tomin(query_delay, bucket_span / 2).
- Stop datafeed:
ad_manage_datafeed(action=_stop) - Close job
- Update config:
ad_update_model_memory_limit,ad_update_datafeed_query_delay,ad_update_delayed_data_check_config - Open job:
ad_open_job - Start datafeed:
ad_manage_datafeed(action=_start)
Recover a corrupted period without resetting the whole model: ad_revert_model_snapshot.
| Category | Tools |
|---|---|
| Permissions / metadata | ad_validate_ml_tool_permissions, ad_get_available_metadata, ad_get_jobs |
| Job + datafeed state | ad_get_job_datafeed_config, ad_get_job_messages, ad_manage_datafeed, ad_preview_datafeed_with_latency |
| Timing / missing docs | ad_ts_delayed_data_annotations, ad_ts_bucket_event_gaps, ad_ts_ingest_latency_estimate, ad_update_datafeed_query_delay, ad_update_delayed_data_check_config, ad_wf_troubleshoot_query_delay |
| Memory | ad_ts_model_memory_health, ad_wf_ts_field_cardinality, ad_estimate_memory_requirement, ad_update_model_memory_limit, ad_wf_troubleshoot_memory_limit |
| Model / lifecycle | ad_get_model_snapshots, ad_revert_model_snapshot, ad_open_job, ad_create_job |
| CCS | ad_ts_ccs_diagnostics |
| Calendars | ad_get_calendar_events, ad_create_calendar_event |
Full parameter tables, ES|QL templates, and REST step lists: references/troubleshoot-anomaly-tool-reference.md.
ad_validate_ml_tool_permissionsfirst — missing privileges produce misleading empty results.- Fix memory before
query_delay—hard_limitcorrupts state;query_delayfixes on a memory-limited job are wasted. - Stop the datafeed before updating it. Updating a running datafeed is rejected.
- Close the job before updating memory limit. Sequence above.
- Prefer workflow tools (
ad_wf_*) over manually chaining diagnostics for complex decisions. ad_preview_datafeed_with_latencybefore starting — confirm the datafeed returns data after config changes.
When: "set up a job", "create an ML detector", "monitor X over time", "detect rare/unusual/anomalous values".
PUT _ml/anomaly_detectors/<job_id> # 1. Define job (ad_create_job)
PUT _ml/datafeeds/datafeed-<job_id> # 2. Define datafeed (ad_create_datafeed)
POST _ml/anomaly_detectors/<job_id>/_open # 3a. Open job (ad_open_job)
POST _ml/datafeeds/datafeed-<job_id>/_start # 3b. Start datafeed (ad_manage_datafeed action=_start)
GET _ml/anomaly_detectors/<job_id>/results/records # 4. Read results
-
Build configs. Parse the user request into job + datafeed JSON with no null fields.
-
Apply smart defaults:
Field Default Override when bucket_span"15m"User specifies a different span time_field"@timestamp"User names a different timestamp field index"logs-*"User specifies an index or pattern datafeed_query{"match_all": {}}User mentions filters, processes, or time windows influencersby/over/partition fields from detectors User adds extra influencer fields job_idGenerated from user description User provides an explicit ID query_delay"60s"P95 ingest latency is higher -
Choose detector function from user intent — full table in references/anomaly-detection-functions.md:
- "high CPU" / "unusually large" →
high_meanorhigh_sum - "rare logins" / "unusual values" →
rare(variants below) - "too many requests" / "spike in count" →
high_count
rarevariants:- Infrequent globally →
rare by_field_name: X - Infrequent vs peers →
rare by_field_name: X over_field_name: Y - Infrequent per segment →
rare by_field_name: X partition_field_name: Y - Infrequent per segment vs peers →
rare by_field_name: X over_field_name: Y partition_field_name: Z
- "high CPU" / "unusually large" →
-
Validate.
platform.core.get_index_mappingon the target index to verify field existence/types →ad_validate_job_spec. If errors, fix and re-validate (max 3 attempts). -
Present and confirm. Show the complete job + datafeed bodies formatted as the exact API calls. Ask for approval once. If feedback, incorporate and re-present (up to 3 rounds).
-
Deploy. After confirmation:
ad_create_job→ad_create_datafeed→ad_open_job→ad_manage_datafeed(action=_start). Report finaljob_idanddatafeed_id.
For batch analysis on historical data, pass start and end to the datafeed start call.
Worked examples (rare-username, DNS exfil, large-downloads) with full JSON bodies and datafeed filters: references/job-creation-recipes.md.
- Create job before datafeed. Datafeed references job by ID.
- Open job before starting datafeed. Start on a closed job is rejected.
query_delay= P95 ingest latency + buffer (60s–120s safe default).- Forecasts require non-population jobs —
over_field_namejobs cannot be forecasted; warn before attempting. by_field_namevsover_field_name:bycompares entity to its own history;overcompares to peer group in the same bucket.partition_field_name= fully independent sub-model with its own normalization.bucket_spanmatches detection granularity — 15m for high-frequency, 1h for operational metrics, 1d for daily patterns. Larger smooths short spikes; smaller increases noise.
Requires Node.js 18+. Defaults to elastic/changeme when no credentials are supplied.
cd skills/kibana/kibana-anomaly-detection
# tools → workflows → skills
node scripts/kibana-agent-builder.mjs all register --kibana-url http://localhost:5601
# HTTPS with self-signed cert
node scripts/kibana-agent-builder.mjs all register --kibana-url https://localhost:5601 --insecureall register runs tools register, then workflows register, then skills register. Kibana allows at most five
tool_ids per skill; the script fills them by scanning SKILL.md for tool mentions (in document order), then appends
ids from references/kibana/tools/esql/*.json until the cap (workflow-only tools omitted by default). If you run
skills register alone, run tools register first so those ids exist.
Workflow tool exclusions and prefixes live in scripts/agent_builder_constants.json.
MCP API key permissions:
- Kibana:
read_onechat,space_read - Index:
read,view_index_metadataon.ml-anomalies-*,.ml-annotations-*,.ml-notifications-*,.ml-config - For source evidence:
readon source data indices
ES|QL tool specs live under references/kibana/tools/esql/*.json; workflow definitions under
references/kibana/workflows/*.yaml. Each Mode section above lists the tools it uses. Full surface:
references/tools.md (ES|QL) and references/workflow-tools.md
(workflows).
| Index | Relevant content |
|---|---|
.ml-anomalies-* |
record, bucket, influencer, model_plot, model_forecast, model_snapshot, category_definition, model_size_stats |
.ml-config |
job/datafeed documents (visible even for never-run jobs) |
.ml-annotations-* |
delayed data (event == "delayed_data") |
.ml-notifications-* |
job messages (level: info/warning/error) |
RCA: "Something caused a spike in our error rate at 2pm — what broke?" → Investigate → ad_get_available_metadata →
ad_query_anomaly_timeline → ad_rca_cross_job_entity_match → ad_rca_multi_job_entities → RCA report.
Score drop: "My anomaly score went from 90 to 55 — did the model change?" → Explain → ad_rca_score_reassessment
for drift → explain renormalization if score_drift is large.
Memory limit: "Job status shows hard_limit and results look wrong." → Troubleshoot → ad_ts_model_memory_health →
ad_wf_ts_field_cardinality → ad_estimate_memory_requirement → ad_update_model_memory_limit (lifecycle: stop
datafeed → close → update → open → start).
New job: "Detect unusual error rates per host on nginx access logs." → Manage → high_count detector with
by_field_name: "host.keyword" → validate → present → deploy.
Multi-mode: "We had an incident last night, scores were high but now low — is the job healthy?" → Investigate the
incident → Explain the score drift → Troubleshoot if hard_limit or delayed data is suspected.
- Pick a mode first. Don't blend RCA logic with score-explanation logic in one response.
ad_validate_ml_tool_permissionsfirst on empty results — privileges are the most common false-negative cause.- Score bands are absolute thresholds:
>75critical,50–75warning,25–50minor,<25informational. - Multi-job entities are prime suspects. Use
min_job_count=2inad_rca_multi_job_entities. - Show
initial_record_scorealongsiderecord_score— the gap tells the renormalization story. - Fix memory before
query_delay.hard_limitinvalidates downstream diagnostics. - Stop datafeed → close job → update config → open job → start datafeed for any config change to memory or query delay.
- Confirm RCAs with
ad_rca_source_evidence. Raw source documents are ground truth.