Add find-security-rules skill for the agent builder by nkhristinin · Pull Request #269089 · elastic/kibana

nkhristinin · 2026-05-13T11:16:35Z

Summary

Adds the find-security-rules skill to Security AI Assistant's Agent Builder integration. This skill enables natural-language rule discovery queries (listing, filtering, counting, sorting detection rules) via two inline tools:

security.find_rules — lists, filters, sorts, and counts detection rules using flat parameters (searchTerm, enabled, ruleSource, severity, ruleType, tags, excludeTags, mitreTechnique, ruleId, sortField, sortOrder, perPage). Delegates to the existing convertRulesFilterToKQL() for base filtering.
security.discover_rule_tags — discovers all available rule tag values (no parameters). Must be called before any tag-based filtering to avoid hallucinated tag names.

The skill also references the existing security.alerts registry tool for noisy-rules queries that correlate alert volume with rule metadata via kibana.alert.rule.rule_id.

Changes

Skill code: find_rules_skill.ts, find_rules_tool.ts, discover_rule_tags_tool.ts
Unit tests: 48 tests covering filter building, KQL generation, tool handlers, skill registration, and allowlist membership
Eval suite: 16 rule-discovery examples + 6 distractor examples + 1 multi-turn conversation test
Fixtures: 10 seeded detection rules + 50 synthetic alerts with scoped cleanup (only deletes fixture rules/alerts by name)

Eval scores (Sonnet 4.6, 22 examples)

Rule discovery (16 examples):

Evaluator	Mean	Min	Max
Groundedness	0.97	0.79	1.00
Sequence Accuracy	0.94	0.00	1.00
ToolUsageOnly	1.00	1.00	1.00
Factuality	0.75	0.00	1.00
Relevance	0.64	0.28	1.00

Distractor routing (6 examples) and multi-turn (1 example) also pass. Distractor Factuality is expectedly low (0.16) — those examples test routing away from the skill, so the expected outputs are vague intent statements that the strict claim-by-claim scorer penalizes.

Trace-based evaluators (Latency, Token counts, Skill Invoked) require a trace ES endpoint and are not reported here.

Test plan

Unit tests pass (48/48)
Eval suite passes (3/3 tests)
Manual verification: ask AI Assistant "List all enabled detection rules tagged with MITRE" and confirm it uses the find-security-rules skill
Manual verification: ask "Show me my network detection rules" and confirm tag discovery happens before filtering

patrykkopycinski · 2026-05-13T14:23:12Z

Skill review: findings + lessons captured

Thanks @nkhristinin — this skill surfaces several reusable patterns I haven't seen all together before, and a handful of issues that turn out to be reusable lessons too. Both have been formalized into elastic/agent-builder-skill-dev-cursor-plugin so future authors of similar skills get the patterns by default and don't repeat the mistakes.

✅ Strong patterns now codified for re-use

These nine patterns are now ## SKILL DESIGN PATTERNS (cross-cutting) in knowledge/domain-patterns.md — captured in skill-dev-plugin#12 (merged):

Pattern	Where in this PR
A. Negative routing matrix ("Do NOT Use When")	`find_rules_skill.ts` — the strongest single defense against false-positive activation
B. Hallucination guard ("Never Invent Values")	`find_rules_skill.ts` — Grounding section names specific fabrication shapes
C. Discover-then-filter w/ structured exception	content "Tag Discovery" + MITRE-IDs-skip-discovery exemption
D. Read-only "Action Limitations" w/ named escape hatches	content pre-empts the three things the agent will try (other tools / sub-agent / connector lookup)
E. Atomic-condition DNF filter (one-field-per-object)	`find_rules_tool.ts::conditionSchema` + `andGroupSchema`
F. Truncation flag in tool output	`find_rules_tool.ts` aggregate handler emits `otherDocCount` + steers the LLM in the message
G. Two-tier zero-result hint	`find_rules_tool.ts` find handler — different hint for "filter too narrow" vs "tag value doesn't exist"
H. Distractor block + multi-turn refinement test	`find_rules.spec.ts` — three describe blocks, the multi-turn one catches "agent paraphrases prior turn from memory"
I. Self-documenting dataset description	`find_rules.spec.ts` — fixture intent encoded in `dataset.description`

If the next person picking up a similar skill learns nothing else from this PR, they should pick up A, E, and H.

Findings — fix before merge

🔴 Blocker — `evaluate_dataset.ts` hardcodes `skillName: 'data-exploration'`

x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/src/evaluate_dataset.ts wires createSkillInvocationEvaluator({ skillName: 'data-exploration' }) while the same file is the shared evaluator-wiring for five sibling suite domains: esql/, external/, kb/, product_documentation/, security/. The skill-activation column is therefore noise for every leaf skill that isn't data-exploration — including this PR's find-rules.

Compounding: the file then defines a ~60-line ExpectedSkillInvocation evaluator that re-implements the platform primitive's ES|QL query parameterised by metadata.expectedSkill. Two skill-activation evaluators in one suite is always wrong — pick the platform primitive parameterised correctly, and delete the duplicate.

Fix: replace the literal with per-example resolution from metadata.expectedSkill. If createSkillInvocationEvaluator doesn't yet support a resolveSkillName overload, push the overload upstream into @kbn/evals first, then apply the recipe locally.

Lesson captured: skill-dev-plugin#11 (merged):

anti-pattern #19 — full wrong/right pair
shared-eval-dataset-hardcoded-skill-name — the verbatim fix recipe
rules/shared-eval-dataset-skill-name-hardcoded.mdc — auto-fires on **/evaluate_dataset.ts so the next agent reviewing a suite-shared evaluator-wiring file flags this before the PR opens

🔴 Blocker — no `tool_sequence` annotations + no `createTrajectoryEvaluator`

find_rules.spec.ts asserts toolCalls.length and toolCalls.some(...) ad-hoc per example but does not run a real trajectory evaluator. With createTrajectoryEvaluator (LCS-based scoring, zero per-example LLM cost) the same assertions become a structured per-example score that lights up regression in nightly runs and gives you a stable "did the agent call the right tools in roughly the right order" signal across iterations.

Fix: add tool_sequence?: string[] to dataset examples (minimum-sufficient sequences only — extra tools the agent calls don't lower the LCS score, but a too-long golden produces false regressions when the prompt is improved) and wire createTrajectoryEvaluator with the wrapper that returns N/A when tool_sequence is absent so unannotated examples don't get penalized.

Reference implementation already in tree: x-pack/solutions/security/packages/kbn-evals-suite-alerts-rag/src/evaluate_dataset.ts::createAlertsRagTrajectoryEvaluator.

Lesson already captured (predates this PR): skill-dev-plugin anti-pattern #17 + recipe add-trajectory-evaluator — the recipe renders the full three-edit sequence (dataset type / toDatasetExample / wrapped evaluator) verbatim.

🟡 Important — `conditionSchema` union has 12 variants (OSS budget is 8)

find_rules_tool.ts::conditionSchema declares 12 z.object({...}).strict() variants. Per skill-dev-plugin's tool-conventions.mdc schema-budget rule, OSS models start dropping schema fidelity above ~8 union variants.

Fix options (pick one):

Collapse range pairs: { riskScore: { gte?: number; lte?: number } } saves 2 variants from riskScoreGte / riskScoreLte. Same applies to any other *Gte/*Lte pairs.
Promote rare fields to top-level args: enabled is Boolean-valued only — moving it from the union to a top-level enabled?: boolean arg removes one variant without losing expressiveness.
Split the tool: find_rules for the common conditions + find_rules_advanced for the long tail (uncommon enough to live behind a separate registration).

Even after consolidation the schema is large. Worth previewing it with tooling { action: "preview-for-skill", skill_id: "find-rules" } to see exactly the JSON shape the agent receives.

🟡 Important — KQL building reimplements existing UI utilities

find_rules_tool.ts::buildFullFilter hand-rolls escape logic for nameContains and ruleUuid. Both have shipped utility functions used by the Detection UI for the same field families — importing them keeps you aligned with whatever escape rules ship in future Kibana versions instead of drifting. Search x-pack/solutions/security/plugins/security_solution/public/detection_engine/.../helpers.ts for buildKql* / escapeQuotes.

(I didn't trace the exact import path because the change is straightforward once you grep — happy to dig further if you want a specific recommendation.)

🟡 Important — `SKILL.md` missing "Response Format" section

skill-dev-plugin's skill-conventions.mdc requires sections in this order: When to Use → Process → Examples → Guardrails → Response Format. Your skill body has the first 4 (excellently) but the Response Format section is missing. Skills that emit lists/aggregates particularly benefit because the LLM otherwise invents the markdown shape per-call.

Lesson captured: recipe add-response-format renders a template-and-example shape ready to paste.

🟢 Nice-to-have — stale `ruleId` reference in fixture comment

find_rules_fixtures.ts has a comment referencing ruleId but the actual field on the seeded rules is ruleUuid. One-line fix.

🟢 Nice-to-have — hardcoded alerts index in fixtures

find_rules_fixtures.ts writes synthetic alerts to a hardcoded index name. Brittle if the fixture ever needs to run against a non-default space or a custom alerts data view. Suggest reading from a config constant or accepting via the evaluate.fixture arg.

TL;DR

9 patterns from this PR are now in domain-patterns.md so the next 9 skill authors copy them by default.
3 fixable issues, two of which (Auto-refresh setting doesn't appear to get saved properly. #19 / Small issues with pies #17 lessons) are now auto-detected by repo-local rules in skill-dev-plugin so they can't slip back in.
Highest-impact one is the hardcoded skillName — landing it as-is would make the activation column noise across all five suite domains, not just find-rules.

Happy to pair on any of the above. The "fix the eval-suite first, then iterate the skill changes" sequence usually pays off because you can then watch the trajectory + activation evaluators flip green per-iteration.

When no security.rule attachment is present (e.g. the user selected a rule from find-rules output), the skill now directs the agent to call resolve_rule_attachment first, render the result, then proceed with the normal diagnostic branches. Also updates the skill description to reflect that the attachment is no longer a prerequisite for triggering the skill, and adds a TODO comment marking where SECURITY_FIND_RULES_TOOL_ID should be added to getRegistryTools once elastic#269089 lands.

- Drop 3 conditions (mitreTactic, indexPattern, riskScoreMax) from rule_filter.ts - Remove verbose fields from summarizeRule (ruleTypeId, index, threat, interval, createdAt) - Shorten skill and tool descriptions - Rewrite eval expected outputs to be data-agnostic with rule details - Add tool_sequence annotations to all eval examples - Add pre-seed cleanup to fixtures preventing leftover contamination - Add distractor suite (6 examples) and multi-turn conversation test - Revert evaluate_dataset.ts to main (no shared file changes) Eval results (Sonnet, 22 examples): Groundedness: 0.92 Sequence Accuracy: 0.97 ToolUsageOnly: 1.00 Relevance: 0.54 Factuality: 0.18

Rewrite 16 expected outputs to comprehensively cover LLM response claims: add rule types, explicit counts, tool usage patterns, and per-rule severity/risk/enabled details. Factuality 0.19 → 0.65.

… custom filter DSL Replace AND/OR group filter language with flat params delegating to the existing convertRulesFilterToKQL(). Simplify discover_rule_tags to accept no parameters. Delete rule_filter.ts and consolidate buildToolFilter into find_rules_tool.ts. Use CreatedRule interface in eval fixtures.

…port - Rename skill id/name from 'find-rules' to 'find-security-rules' for disambiguation with non-detection rule types - Handle tags with OR semantics instead of delegating to convertRuleTagsToKQL (which uses AND), matching the documented behavior - Fix stale buildFullFilter export in index.ts (now buildToolFilter) - Update eval expectedSkill references

The skill id was renamed but the server-side allowlist still had the old name, preventing the skill from loading.

…re cleanup - Rename nameContains to searchTerm to reflect that convertRulesFilterToKQL searches name, index patterns, and MITRE fields - Use EXPECTED_MAX_TAGS constant instead of hardcoded 500 for tag aggregation - Add isAllowedBuiltinSkill test to prevent allowlist drift - Fixture cleanup now scopes to fixture rule names instead of deleting all rules/alerts - Remove tool_sequence from eval metadata, fix stale eval examples

nkhristinin · 2026-05-21T14:59:24Z

@elasticmachine merge upstream

elasticmachine · 2026-05-21T14:59:27Z

There are no new commits on the base branch.

… find-rules-skill

nkhristinin · 2026-05-26T11:29:19Z

@elasticmachine merge upstream

sdesalas · 2026-05-27T10:31:17Z

Hi Nikita.

Reviewing on behalf of @elastic/security-detection-rule-management.

Code we own seems ok. Single addition to rule_fields.ts‎ a file containing constants. Nothing out of the ordinary.

I also checked through the security_solution files and tested the functionality locally. Here are some things I noticed:

Good news

BEFORE / AFTER: This skill definitely becomes available

Can find rules using filters

Bad news

Counts do not match up with MITRE ATT&CK page

This is understandable but poor UX. Let me explain.

The MITRE ATT&CK page looks at actual technique relationships not tags, (a rule has an array of tactics/techniques) and a separate array of tags), if a rule is related to a technique but not tagged as such it will get missed from the output.

So here 76 rules ARE RELATED to the "Initial Access" technique, but we have not been keeping the tags updated, and the agent cannot search through these relationships, so the result is incorrect. Or rather a discrepancy that appears depending on which source of truth you look at.

This problem has always been there (because there is more than one source of truth) and it could be surfaced by searching these two separate screens independently. However it appears worse now because the agent speaks so confidently and makes it easier to surface the discrepancy on the same screen (the MITRE ATT&CK one)

Not sure this is enough to hold up the PR, but worth socialising before aproval.

sdesalas · 2026-05-27T11:25:40Z

Continuing on that last thread (discrepancies on MITRE ATT&CK page.

Here are some more screenshots:

The AI Agent is aware that there is more than one way to measure techniques for a detection rule.

But it struggles to get accurate measures when using the tactic-rule relationship instead of the tag:

CORRECT

Prompt: I noticed a discrepancy. In mitre attack page I'm seeing 76 enabled rules that have initial access technique. Can you search by relationship instead of by mitre tag?

INCORRECT

Prompt: How many "persistence" tactic are enabled, using the relationship, not the tag.

Prompt: Can you get me the counts on MITRE attack page?

… find-rules-skill

+    searchTerm: z
+      .string()


+      .optional()
+      .describe('Rule types to include (OR). E.g. ["query", "eql"].'),
+    tags: z
+      .array(z.string().min(1))


+        'Exact tag values to include (OR). Discover values first via `security.discover_rule_tags`.'
+      ),
+    excludeTags: z
+      .array(z.string().min(1))


+    mitreTechnique: z
+      .string()


+    mitreTactic: z
+      .string()


+    ruleId: z
+      .string()


nkhristinin · 2026-05-29T11:58:11Z

@elasticmachine merge upstream

joemcelroy · 2026-05-29T12:51:54Z

+This skill is read-only. Never suggest or offer to enable, disable, edit, delete, duplicate, or bulk-edit rules. Do not prompt the user to take any action on the rules returned. If the user asks to modify a rule, direct them to the Detection Rules UI.`,
+    getRegistryTools: () => [SECURITY_ALERTS_TOOL_ID],
+    getInlineTools: () => [
+      createFindRulesInlineTool({ getStartServices, logger }),


a question, no change needed here: you created these tools inline here because you dont want them showing up in the registry?

nkhristinin · 2026-06-01T10:06:18Z

@elasticmachine merge upstream

kibanamachine · 2026-06-01T10:56:28Z

💔 Build Failed

Buildkite Build
Commit: afd847a
Build duration: 60 mins

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #212 / Cloud Security Posture - Group 5 (KSPM + Flyouts) Security Network Page - Graph visualization expanded flyout - filter by node
[job] [logs] FTR Configs #212 / Cloud Security Posture - Group 5 (KSPM + Flyouts) Security Network Page - Graph visualization expanded flyout - filter by node
[job] [logs] FTR Configs #63 / Endpoint plugin @ess @serverless @skipInServerlessMKI Endpoint Scripts Library RBAC Download API "before each" hook for "should return script file download when user has READ privileges"
[job] [logs] Scout Lane #3 - stateful-classic / default / local-stateful-classic - UptimeIntegrationDeprecation - returns true when non-managed synthetics policies exist

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`securitySolution`	731	733	+2

Total ESLint disabled count

id	before	after	diff
`securitySolution`	837	839	+2

History

💛 Build #450256 was flaky 66c3571
💛 Build #449553 was flaky a32cb69
💔 Build #448077 failed 6355329
💛 Build #446942 was flaky 3b495d0
💛 Build #446351 was flaky 74f117f

nkhristinin added 2 commits May 13, 2026 09:55

add find rules skill

0897602

Merge branch 'main' into find-rules-skill

08d706d

rylnd mentioned this pull request May 14, 2026

[AutoDex] Investigate Rule Skill #269241

Draft

10 tasks

Merge branch 'main' into find-rules-skill

89bf76e

nkhristinin added 2 commits May 21, 2026 11:14

refactor skill to use 2 different tools inline

a8a3615

nkhristinin changed the title ~~Find rules skill~~ Add find-rules skill for Security AI Assistant May 21, 2026

nkhristinin added 6 commits May 21, 2026 14:25

Improve eval expected outputs for higher Factuality scores

36b6cb3

Rewrite 16 expected outputs to comprehensively cover LLM response claims: add rule types, explicit counts, tool usage patterns, and per-rule severity/risk/enabled details. Factuality 0.19 → 0.65.

alert.attributes.params.ruleId

a7218c7

Fix skill allowlist: update find-rules to find-security-rules

0b37b57

The skill id was renamed but the server-side allowlist still had the old name, preventing the skill from loading.

nkhristinin changed the title ~~Add find-rules skill for Security AI Assistant~~ Add find-security-rules skill for Security AI Assistant May 21, 2026

nkhristinin changed the title ~~Add find-security-rules skill for Security AI Assistant~~ Add find-security-rules skill for the agent builder May 21, 2026

nkhristinin marked this pull request as ready for review May 21, 2026 14:54

nkhristinin requested review from a team as code owners May 21, 2026 14:54

nkhristinin requested a review from sdesalas May 21, 2026 14:54

nkhristinin added backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes labels May 21, 2026

nkhristinin force-pushed the find-rules-skill branch from 653676f to c22723f Compare May 21, 2026 14:57

Merge branch 'main' into find-rules-skill

7668c07

kibanamachine added 2 commits May 21, 2026 15:05

Changes from node scripts/lint_ts_projects --fix

82348fd

Changes from node scripts/regenerate_moon_projects.js --update

74f117f

nkhristinin added 2 commits May 22, 2026 15:50

skill refactor for multitern

b285d6c

Merge branch 'find-rules-skill' of github.com:nkhristinin/kibana into…

3b495d0

… find-rules-skill

Merge branch 'main' into find-rules-skill

6355329

nkhristinin added 2 commits May 28, 2026 13:38

updates for mitre

9f7c466

Merge branch 'find-rules-skill' of github.com:nkhristinin/kibana into…

a32cb69

… find-rules-skill

github-advanced-security AI found potential problems May 28, 2026

View reviewed changes

Merge branch 'main' into find-rules-skill

66c3571

joemcelroy reviewed May 29, 2026

View reviewed changes

joemcelroy approved these changes May 29, 2026

View reviewed changes

Merge branch 'main' into find-rules-skill

afd847a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add find-security-rules skill for the agent builder#269089

Add find-security-rules skill for the agent builder#269089
nkhristinin wants to merge 21 commits into
elastic:mainfrom
nkhristinin:find-rules-skill

nkhristinin commented May 13, 2026 •

edited

Loading

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

nkhristinin commented May 21, 2026

Uh oh!

elasticmachine commented May 21, 2026

Uh oh!

nkhristinin commented May 26, 2026

Uh oh!

sdesalas commented May 27, 2026 •

edited

Loading

Uh oh!

sdesalas commented May 27, 2026 •

edited

Loading

Uh oh!

nkhristinin commented May 29, 2026

Uh oh!

joemcelroy May 29, 2026

Uh oh!

nkhristinin commented Jun 1, 2026

Uh oh!

kibanamachine commented Jun 1, 2026 •

edited

Loading

ESLint disabled line counts

Total ESLint disabled count

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

nkhristinin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Eval scores (Sonnet 4.6, 22 examples)

Test plan

Uh oh!

patrykkopycinski commented May 13, 2026

Skill review: findings + lessons captured

✅ Strong patterns now codified for re-use

Findings — fix before merge

🔴 Blocker — evaluate_dataset.ts hardcodes skillName: 'data-exploration'

🔴 Blocker — no tool_sequence annotations + no createTrajectoryEvaluator

🟡 Important — conditionSchema union has 12 variants (OSS budget is 8)

🟡 Important — KQL building reimplements existing UI utilities

🟡 Important — SKILL.md missing "Response Format" section

🟢 Nice-to-have — stale ruleId reference in fixture comment

🟢 Nice-to-have — hardcoded alerts index in fixtures

TL;DR

Uh oh!

nkhristinin commented May 21, 2026

Uh oh!

elasticmachine commented May 21, 2026

Uh oh!

nkhristinin commented May 26, 2026

Uh oh!

sdesalas commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Good news

BEFORE / AFTER: This skill definitely becomes available

Can find rules using filters

Bad news

Counts do not match up with MITRE ATT&CK page

Uh oh!

sdesalas commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CORRECT

INCORRECT

Uh oh!

nkhristinin commented May 29, 2026

Uh oh!

joemcelroy May 29, 2026

Choose a reason for hiding this comment

Uh oh!

nkhristinin commented Jun 1, 2026

Uh oh!

kibanamachine commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

ESLint disabled line counts

Total ESLint disabled count

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

nkhristinin commented May 13, 2026 •

edited

Loading

🔴 Blocker — `evaluate_dataset.ts` hardcodes `skillName: 'data-exploration'`

🔴 Blocker — no `tool_sequence` annotations + no `createTrajectoryEvaluator`

🟡 Important — `conditionSchema` union has 12 variants (OSS budget is 8)

🟡 Important — `SKILL.md` missing "Response Format" section

🟢 Nice-to-have — stale `ruleId` reference in fixture comment

sdesalas commented May 27, 2026 •

edited

Loading

sdesalas commented May 27, 2026 •

edited

Loading

kibanamachine commented Jun 1, 2026 •

edited

Loading