Skip to content

Add find-security-rules skill for the agent builder#269089

Open
nkhristinin wants to merge 21 commits into
elastic:mainfrom
nkhristinin:find-rules-skill
Open

Add find-security-rules skill for the agent builder#269089
nkhristinin wants to merge 21 commits into
elastic:mainfrom
nkhristinin:find-rules-skill

Conversation

@nkhristinin
Copy link
Copy Markdown
Contributor

@nkhristinin nkhristinin commented May 13, 2026

Summary

Adds the find-security-rules skill to Security AI Assistant's Agent Builder integration. This skill enables natural-language rule discovery queries (listing, filtering, counting, sorting detection rules) via two inline tools:

  • security.find_rules — lists, filters, sorts, and counts detection rules using flat parameters (searchTerm, enabled, ruleSource, severity, ruleType, tags, excludeTags, mitreTechnique, ruleId, sortField, sortOrder, perPage). Delegates to the existing convertRulesFilterToKQL() for base filtering.
  • security.discover_rule_tags — discovers all available rule tag values (no parameters). Must be called before any tag-based filtering to avoid hallucinated tag names.

The skill also references the existing security.alerts registry tool for noisy-rules queries that correlate alert volume with rule metadata via kibana.alert.rule.rule_id.

Changes

  • Skill code: find_rules_skill.ts, find_rules_tool.ts, discover_rule_tags_tool.ts
  • Unit tests: 48 tests covering filter building, KQL generation, tool handlers, skill registration, and allowlist membership
  • Eval suite: 16 rule-discovery examples + 6 distractor examples + 1 multi-turn conversation test
  • Fixtures: 10 seeded detection rules + 50 synthetic alerts with scoped cleanup (only deletes fixture rules/alerts by name)

Eval scores (Sonnet 4.6, 22 examples)

Rule discovery (16 examples):

Evaluator Mean Min Max
Groundedness 0.97 0.79 1.00
Sequence Accuracy 0.94 0.00 1.00
ToolUsageOnly 1.00 1.00 1.00
Factuality 0.75 0.00 1.00
Relevance 0.64 0.28 1.00

Distractor routing (6 examples) and multi-turn (1 example) also pass. Distractor Factuality is expectedly low (0.16) — those examples test routing away from the skill, so the expected outputs are vague intent statements that the strict claim-by-claim scorer penalizes.

Trace-based evaluators (Latency, Token counts, Skill Invoked) require a trace ES endpoint and are not reported here.

Test plan

  • Unit tests pass (48/48)
  • Eval suite passes (3/3 tests)
  • Manual verification: ask AI Assistant "List all enabled detection rules tagged with MITRE" and confirm it uses the find-security-rules skill
  • Manual verification: ask "Show me my network detection rules" and confirm tag discovery happens before filtering

@patrykkopycinski
Copy link
Copy Markdown
Contributor

Skill review: findings + lessons captured

Thanks @nkhristinin — this skill surfaces several reusable patterns I haven't seen all together before, and a handful of issues that turn out to be reusable lessons too. Both have been formalized into elastic/agent-builder-skill-dev-cursor-plugin so future authors of similar skills get the patterns by default and don't repeat the mistakes.

✅ Strong patterns now codified for re-use

These nine patterns are now ## SKILL DESIGN PATTERNS (cross-cutting) in knowledge/domain-patterns.md — captured in skill-dev-plugin#12 (merged):

Pattern Where in this PR
A. Negative routing matrix ("Do NOT Use When") find_rules_skill.ts — the strongest single defense against false-positive activation
B. Hallucination guard ("Never Invent Values") find_rules_skill.ts — Grounding section names specific fabrication shapes
C. Discover-then-filter w/ structured exception content "Tag Discovery" + MITRE-IDs-skip-discovery exemption
D. Read-only "Action Limitations" w/ named escape hatches content pre-empts the three things the agent will try (other tools / sub-agent / connector lookup)
E. Atomic-condition DNF filter (one-field-per-object) find_rules_tool.ts::conditionSchema + andGroupSchema
F. Truncation flag in tool output find_rules_tool.ts aggregate handler emits otherDocCount + steers the LLM in the message
G. Two-tier zero-result hint find_rules_tool.ts find handler — different hint for "filter too narrow" vs "tag value doesn't exist"
H. Distractor block + multi-turn refinement test find_rules.spec.ts — three describe blocks, the multi-turn one catches "agent paraphrases prior turn from memory"
I. Self-documenting dataset description find_rules.spec.ts — fixture intent encoded in dataset.description

If the next person picking up a similar skill learns nothing else from this PR, they should pick up A, E, and H.

Findings — fix before merge

🔴 Blocker — evaluate_dataset.ts hardcodes skillName: 'data-exploration'

x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/src/evaluate_dataset.ts wires createSkillInvocationEvaluator({ skillName: 'data-exploration' }) while the same file is the shared evaluator-wiring for five sibling suite domains: esql/, external/, kb/, product_documentation/, security/. The skill-activation column is therefore noise for every leaf skill that isn't data-exploration — including this PR's find-rules.

Compounding: the file then defines a ~60-line ExpectedSkillInvocation evaluator that re-implements the platform primitive's ES|QL query parameterised by metadata.expectedSkill. Two skill-activation evaluators in one suite is always wrong — pick the platform primitive parameterised correctly, and delete the duplicate.

Fix: replace the literal with per-example resolution from metadata.expectedSkill. If createSkillInvocationEvaluator doesn't yet support a resolveSkillName overload, push the overload upstream into @kbn/evals first, then apply the recipe locally.

Lesson captured: skill-dev-plugin#11 (merged):

🔴 Blocker — no tool_sequence annotations + no createTrajectoryEvaluator

find_rules.spec.ts asserts toolCalls.length and toolCalls.some(...) ad-hoc per example but does not run a real trajectory evaluator. With createTrajectoryEvaluator (LCS-based scoring, zero per-example LLM cost) the same assertions become a structured per-example score that lights up regression in nightly runs and gives you a stable "did the agent call the right tools in roughly the right order" signal across iterations.

Fix: add tool_sequence?: string[] to dataset examples (minimum-sufficient sequences only — extra tools the agent calls don't lower the LCS score, but a too-long golden produces false regressions when the prompt is improved) and wire createTrajectoryEvaluator with the wrapper that returns N/A when tool_sequence is absent so unannotated examples don't get penalized.

Reference implementation already in tree: x-pack/solutions/security/packages/kbn-evals-suite-alerts-rag/src/evaluate_dataset.ts::createAlertsRagTrajectoryEvaluator.

Lesson already captured (predates this PR): skill-dev-plugin anti-pattern #17 + recipe add-trajectory-evaluator — the recipe renders the full three-edit sequence (dataset type / toDatasetExample / wrapped evaluator) verbatim.

🟡 Important — conditionSchema union has 12 variants (OSS budget is 8)

find_rules_tool.ts::conditionSchema declares 12 z.object({...}).strict() variants. Per skill-dev-plugin's tool-conventions.mdc schema-budget rule, OSS models start dropping schema fidelity above ~8 union variants.

Fix options (pick one):

  1. Collapse range pairs: { riskScore: { gte?: number; lte?: number } } saves 2 variants from riskScoreGte / riskScoreLte. Same applies to any other *Gte/*Lte pairs.
  2. Promote rare fields to top-level args: enabled is Boolean-valued only — moving it from the union to a top-level enabled?: boolean arg removes one variant without losing expressiveness.
  3. Split the tool: find_rules for the common conditions + find_rules_advanced for the long tail (uncommon enough to live behind a separate registration).

Even after consolidation the schema is large. Worth previewing it with tooling { action: "preview-for-skill", skill_id: "find-rules" } to see exactly the JSON shape the agent receives.

🟡 Important — KQL building reimplements existing UI utilities

find_rules_tool.ts::buildFullFilter hand-rolls escape logic for nameContains and ruleUuid. Both have shipped utility functions used by the Detection UI for the same field families — importing them keeps you aligned with whatever escape rules ship in future Kibana versions instead of drifting. Search x-pack/solutions/security/plugins/security_solution/public/detection_engine/.../helpers.ts for buildKql* / escapeQuotes.

(I didn't trace the exact import path because the change is straightforward once you grep — happy to dig further if you want a specific recommendation.)

🟡 Important — SKILL.md missing "Response Format" section

skill-dev-plugin's skill-conventions.mdc requires sections in this order: When to Use → Process → Examples → Guardrails → Response Format. Your skill body has the first 4 (excellently) but the Response Format section is missing. Skills that emit lists/aggregates particularly benefit because the LLM otherwise invents the markdown shape per-call.

Lesson captured: recipe add-response-format renders a template-and-example shape ready to paste.

🟢 Nice-to-have — stale ruleId reference in fixture comment

find_rules_fixtures.ts has a comment referencing ruleId but the actual field on the seeded rules is ruleUuid. One-line fix.

🟢 Nice-to-have — hardcoded alerts index in fixtures

find_rules_fixtures.ts writes synthetic alerts to a hardcoded index name. Brittle if the fixture ever needs to run against a non-default space or a custom alerts data view. Suggest reading from a config constant or accepting via the evaluate.fixture arg.

TL;DR

  • 9 patterns from this PR are now in domain-patterns.md so the next 9 skill authors copy them by default.
  • 3 fixable issues, two of which (Auto-refresh setting doesn't appear to get saved properly. #19 / Small issues with pies #17 lessons) are now auto-detected by repo-local rules in skill-dev-plugin so they can't slip back in.
  • Highest-impact one is the hardcoded skillName — landing it as-is would make the activation column noise across all five suite domains, not just find-rules.

Happy to pair on any of the above. The "fix the eval-suite first, then iterate the skill changes" sequence usually pays off because you can then watch the trajectory + activation evaluators flip green per-iteration.

@rylnd rylnd mentioned this pull request May 14, 2026
10 tasks
rylnd added a commit to rylnd/kibana that referenced this pull request May 20, 2026
When no security.rule attachment is present (e.g. the user selected a
rule from find-rules output), the skill now directs the agent to call
resolve_rule_attachment first, render the result, then proceed with
the normal diagnostic branches.

Also updates the skill description to reflect that the attachment is no
longer a prerequisite for triggering the skill, and adds a TODO comment
marking where SECURITY_FIND_RULES_TOOL_ID should be added to
getRegistryTools once elastic#269089 lands.
- Drop 3 conditions (mitreTactic, indexPattern, riskScoreMax) from rule_filter.ts
- Remove verbose fields from summarizeRule (ruleTypeId, index, threat, interval, createdAt)
- Shorten skill and tool descriptions
- Rewrite eval expected outputs to be data-agnostic with rule details
- Add tool_sequence annotations to all eval examples
- Add pre-seed cleanup to fixtures preventing leftover contamination
- Add distractor suite (6 examples) and multi-turn conversation test
- Revert evaluate_dataset.ts to main (no shared file changes)

Eval results (Sonnet, 22 examples):
  Groundedness:       0.92
  Sequence Accuracy:  0.97
  ToolUsageOnly:      1.00
  Relevance:          0.54
  Factuality:         0.18
@nkhristinin nkhristinin changed the title Find rules skill Add find-rules skill for Security AI Assistant May 21, 2026
Rewrite 16 expected outputs to comprehensively cover LLM response
claims: add rule types, explicit counts, tool usage patterns, and
per-rule severity/risk/enabled details. Factuality 0.19 → 0.65.
… custom filter DSL

Replace AND/OR group filter language with flat params delegating to the
existing convertRulesFilterToKQL(). Simplify discover_rule_tags to accept
no parameters. Delete rule_filter.ts and consolidate buildToolFilter into
find_rules_tool.ts. Use CreatedRule interface in eval fixtures.
…port

- Rename skill id/name from 'find-rules' to 'find-security-rules' for
  disambiguation with non-detection rule types
- Handle tags with OR semantics instead of delegating to
  convertRuleTagsToKQL (which uses AND), matching the documented behavior
- Fix stale buildFullFilter export in index.ts (now buildToolFilter)
- Update eval expectedSkill references
The skill id was renamed but the server-side allowlist still had the old
name, preventing the skill from loading.
…re cleanup

- Rename nameContains to searchTerm to reflect that convertRulesFilterToKQL
  searches name, index patterns, and MITRE fields
- Use EXPECTED_MAX_TAGS constant instead of hardcoded 500 for tag aggregation
- Add isAllowedBuiltinSkill test to prevent allowlist drift
- Fixture cleanup now scopes to fixture rule names instead of deleting all rules/alerts
- Remove tool_sequence from eval metadata, fix stale eval examples
@nkhristinin nkhristinin changed the title Add find-rules skill for Security AI Assistant Add find-security-rules skill for Security AI Assistant May 21, 2026
@nkhristinin nkhristinin changed the title Add find-security-rules skill for Security AI Assistant Add find-security-rules skill for the agent builder May 21, 2026
@nkhristinin nkhristinin marked this pull request as ready for review May 21, 2026 14:54
@nkhristinin nkhristinin requested review from a team as code owners May 21, 2026 14:54
@nkhristinin nkhristinin requested a review from sdesalas May 21, 2026 14:54
@nkhristinin nkhristinin added backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes labels May 21, 2026
@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Copy Markdown
Contributor

There are no new commits on the base branch.

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@sdesalas
Copy link
Copy Markdown
Member

sdesalas commented May 27, 2026

Hi Nikita.

Reviewing on behalf of @elastic/security-detection-rule-management.

Code we own seems ok. Single addition to rule_fields.ts‎ a file containing constants. Nothing out of the ordinary.

I also checked through the security_solution files and tested the functionality locally. Here are some things I noticed:

Good news

BEFORE / AFTER: This skill definitely becomes available

Screenshot 2026-05-27 at 12 25 55 Screenshot 2026-05-27 at 12 27 33

Can find rules using filters

Screenshot 2026-05-27 at 12 20 19 Screenshot 2026-05-27 at 12 18 13

Bad news

Counts do not match up with MITRE ATT&CK page

This is understandable but poor UX. Let me explain.

The MITRE ATT&CK page looks at actual technique relationships not tags, (a rule has an array of tactics/techniques) and a separate array of tags), if a rule is related to a technique but not tagged as such it will get missed from the output.

Screenshot 2026-05-27 at 12 16 03

So here 76 rules ARE RELATED to the "Initial Access" technique, but we have not been keeping the tags updated, and the agent cannot search through these relationships, so the result is incorrect. Or rather a discrepancy that appears depending on which source of truth you look at.

This problem has always been there (because there is more than one source of truth) and it could be surfaced by searching these two separate screens independently. However it appears worse now because the agent speaks so confidently and makes it easier to surface the discrepancy on the same screen (the MITRE ATT&CK one)

Not sure this is enough to hold up the PR, but worth socialising before aproval.

@sdesalas
Copy link
Copy Markdown
Member

sdesalas commented May 27, 2026

Continuing on that last thread (discrepancies on MITRE ATT&CK page.

Here are some more screenshots:

The AI Agent is aware that there is more than one way to measure techniques for a detection rule.

But it struggles to get accurate measures when using the tactic-rule relationship instead of the tag:

CORRECT

Prompt: I noticed a discrepancy. In mitre attack page I'm seeing 76 enabled rules that have initial access technique. Can you search by relationship instead of by mitre tag?

Screenshot 2026-05-27 at 13 10 32

INCORRECT

Prompt: How many "persistence" tactic are enabled, using the relationship, not the tag.

Screenshot 2026-05-27 at 13 16 16

Prompt: Can you get me the counts on MITRE attack page?

Screenshot 2026-05-27 at 13 19 11

Comment on lines +45 to +46
searchTerm: z
.string()
.optional()
.describe('Rule types to include (OR). E.g. ["query", "eql"].'),
tags: z
.array(z.string().min(1))
'Exact tag values to include (OR). Discover values first via `security.discover_rule_tags`.'
),
excludeTags: z
.array(z.string().min(1))
Comment on lines +77 to +78
mitreTechnique: z
.string()
Comment on lines +82 to +83
mitreTactic: z
.string()
Comment on lines +91 to +92
ruleId: z
.string()
@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

This skill is read-only. Never suggest or offer to enable, disable, edit, delete, duplicate, or bulk-edit rules. Do not prompt the user to take any action on the rules returned. If the user asks to modify a rule, direct them to the Detection Rules UI.`,
getRegistryTools: () => [SECURITY_ALERTS_TOOL_ID],
getInlineTools: () => [
createFindRulesInlineTool({ getStartServices, logger }),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a question, no change needed here: you created these tools inline here because you dont want them showing up in the registry?

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Copy Markdown
Contributor

kibanamachine commented Jun 1, 2026

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #212 / Cloud Security Posture - Group 5 (KSPM + Flyouts) Security Network Page - Graph visualization expanded flyout - filter by node
  • [job] [logs] FTR Configs #212 / Cloud Security Posture - Group 5 (KSPM + Flyouts) Security Network Page - Graph visualization expanded flyout - filter by node
  • [job] [logs] FTR Configs #63 / Endpoint plugin @ess @serverless @skipInServerlessMKI Endpoint Scripts Library RBAC Download API "before each" hook for "should return script file download when user has READ privileges"
  • [job] [logs] Scout Lane #3 - stateful-classic / default / local-stateful-classic - UptimeIntegrationDeprecation - returns true when non-managed synthetics policies exist

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 731 733 +2

Total ESLint disabled count

id before after diff
securitySolution 837 839 +2

History

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants