[Spike][Security Solution] Detection Emulation Skill (Epic #15974) — substrate + orchestration + production-opt-in readiness pack by patrykkopycinski · Pull Request #269019 · elastic/kibana

patrykkopycinski · 2026-05-13T03:58:03Z

Summary

Lands the Detection Emulation Skill end-to-end on a single branch. An Agent Builder skill takes a candidate detection rule + target host(s), runs an emulated attack (real EDR action OR ECS log injection), polls the Detection Engine for the alerts the rule produces, and returns a ValidationReport with a confidence score.

Every action is gated by:

two off-by-default experimental flags (detectionEmulationRealExecution, detectionEmulationLogInjection)
a runtime kill switch (xpack.securitySolution.detectionEmulation.realExecution.enabled) for fast disable without redeploy
a default-deny endpoint allowlist (operator must explicitly permit hosts)
per-command RBAC re-checked at every tool boundary
a Zod-enforced ≤5 endpoint fanout cap per call
per-space (100/h) and per-host (3/h) rate limits with atomic acquire-with-rollback
≤1 in-flight real_execution scenario per Kibana space
mandatory HITL confirmation on every destructive command path
a regex allowlist for free-form execute payloads (allowedExecuteCommandPatterns)
full actor attribution (user vs agent-builder, with conversationId / runId / toolCallId / SHA-256 prompt hash) in the audit trail
idempotency cache on tool dispatch gates (prevents double-fire on retried LLM tool calls)

93 files / +17,020 against upstream/main. Server-side orchestration only; existing UI surfaces unchanged. Allowlist + rate-limiter knobs are exposed as Stack Advanced Settings.

Architecture

What's new

Component	Path	Notes
Multi-EDR runner + route	`lib/detection_emulation/execution/runner.ts`, `api/dispatch/route.ts`	Vendor-agnostic dispatcher (`execute`, `runscript`, `kill-process`, `isolate`, `release`).
Payload library	`lib/detection_emulation/payloads/{payloads.json,index.ts}`	12 Wave-1 entries, hard-capped at 15. Data-driven.
Scenario generator	`lib/detection_emulation/scenario_generator.ts`	Pure function. Deterministic `scenarioId = sha256(...)`. Typed errors `no_mitre_tags`, `no_supported_techniques`.
Log injection mode	`lib/detection_emulation/log_injection/{generator,index_template,executor}.ts`	Dedicated `.kibana-security-emulation-logs-<spaceId>-*` index template w/ 7-day ILM. Synthetic ECS docs stamped with `event.dataset`, `event.module: 'emulation'`, and tagged with `emulation: { mode, emulationId, scenarioId }`.
Telemetry collector	`lib/detection_emulation/telemetry_collector.ts`	`poll` + `one_shot` modes. AbortController. Wall-budget enforcement. Queries alerts by `kibana.alert.original_event.module: emulation` OR emulation tag via `bool.should`.
Confidence scorer	`lib/detection_emulation/confidence_scorer.ts`	`confidence = round(coverage * 0.6 + precision * 0.4, 2)` clamped [0, 1]. Pure.
Validation gate	`lib/detection_emulation/execution/validation_gate.ts`	Pre-dispatch gate that combines RBAC, allowlist, fanout, regex allowlist for free-form `execute`, and rate-limit acquire.
Gate primitives	`agent_builder/skills/detection_emulation/gate_checks.ts`	Reusable `withCommandGates` decorator extracted from per-family tool boilerplate. Composes feature-flag → auth → RBAC → allowlist → per-host rate → per-space rate → fanout cap → validation gate → audit in a single pipeline.
Emulation history SO	`lib/detection_emulation/emulation_history/{create,get,find,index}.ts` + `emulation_report_type.ts`	Hidden SO, write-once via `scenarioFingerprint` dedup, namespace-scoped. Model version `2` adds `actor` field with `data_backfill: { kind: 'user' }` for old rows.
`validateRule` route	`lib/detection_emulation/api/validate_rule/route.ts`	`POST /internal/detection_engine/emulation/validate_rule`. Eight-step pipeline. Typed 4xx errors. All strings i18n-translated.
Skill tools (six)	`agent_builder/skills/detection_emulation/{validate_rule_tool,get_emulation_history_tool,create_run_family_command_tool}.ts`	Factory-based per-family tools (`process` / `file` / `network` / `execution`) built via `createRunFamilyCommandTool`. Each runs through `withCommandGates`. `validateRule` and `getEmulationHistory` are standalone tools.
HITL primitives	`agent_builder/skills/detection_emulation/build_emulation_confirmation.ts` + per-family tools	Each per-family tool declares `confirmation: { askUser: 'once', getConfirmation }`; `validateRule` does an on-demand HITL prompt when `mode === 'real_execution'`. Skipped only in `executionMode === 'standalone'` (eval / A2A).
EmulationToolError	`agent_builder/skills/detection_emulation/emulation_tool_errors.ts`	Typed error builder for structured error codes (`feature_flag_disabled`, `endpoint_not_allowed`, etc.) with consistent shape across all tools.
Skill content	`agent_builder/skills/detection_emulation/detection_emulation_skill.ts`	Six tools registered. `referencedContent` array with MITRE ATT&CK overview. Section order: When-to-Use → Process → Examples → Guardrails → Response Format.
Advanced Settings	`common/experimental_features.ts`, `server/config.ts`, runtime config resolver	Allowlist + rate-limiter knobs surface in Stack Management → Advanced Settings. Operators can override per-space without restart.
kbn-evals suite	`packages/kbn-evals-suite-detection-emulation/evals/{validate_rule_dataset,validate_rule.spec}.ts`	Dataset name `security: detection-emulation-validate-rule`. Eight examples (success / failure / log-injection / distractor / `userDeclines` HITL rejection / `realExecBlocked` allowlist 403). Evaluators: `toolSelection`, `schemaCompliance`, `criteria`, plus a trajectory evaluator that scores `tool_sequence` from execution traces.
Shared utilities	`resolveCurrentUsername` moved to shared lib, `savedObjectsClient` threaded through gates	Aligns with patterns from andrew-goldstein's Workflows stack.

Pipeline (route execution order)

 1. Feature-flag gate     detectionEmulationRealExecution      (always required)
                          detectionEmulationLogInjection       (required if mode='log_injection')
 2. Runtime kill switch   xpack.securitySolution.detectionEmulation.realExecution.enabled
 3. Authentication        emulation actions are attributable
 4. RBAC                  per-command authz over the selected payload
 5. Schema gates          MAX_ENDPOINT_FANOUT (5) at Zod boundary
 6. Allowlist             default-deny; operator config drives allowedHosts
 7. Per-space rate        100 commands / space / hour
 8. Per-host rate         3 commands / host / hour (atomic acquire-with-rollback)
 9. Concurrency gate      ≤1 real_execution in flight per space
10. Validation gate       free-form `execute` matched against allowedExecuteCommandPatterns
11. HITL                  framework prompts user; skipped only in standalone executionMode
12. Scenario generator    rule MITRE tags → payload set → deterministic scenarioId
13. Per-payload dispatch  real_execution → execution/runner.ts (multi-EDR)
                          log_injection  → log_injection/executor.ts (synthetic ECS docs)
14. Audit                 actor.kind discriminator + via= suffix on response-action comment
15. Telemetry collector   poll Detection Engine alerts filtered by original_event.module + scenarioId
16. Confidence scorer     coverage * 0.6 + precision * 0.4
17. History write         persist detection-emulation-report SO (model version 2 with actor)
   ────────────────────   ValidationReport (HTTP 200) or typed error (4xx/5xx)

Failure-mode coverage

HTTP	`errorCode`	When
403	`feature_flag_disabled`	Mode requires a flag that's off, or runtime kill switch is set.
403	`endpoint_not_allowed`	endpointId not on the operator allowlist.
403	`command_not_allowed`	Free-form `execute` payload didn't match any `allowedExecuteCommandPatterns`.
403	`user_declined`	HITL prompt rejected by the user.
422	`endpoint_fanout_exceeded`	More than `MAX_ENDPOINT_FANOUT` (5) endpoints in one call.
422	`no_mitre_tags`	Rule has no MITRE techniques.
422	`no_supported_techniques`	Rule MITRE tags don't intersect Wave-1.
404	`rule_not_found`	ruleId doesn't resolve.
429	`rate_limit_exceeded`	Per-space (100/h) or per-host (3/h) bucket exhausted. Response includes `blockedEndpoints` + `Retry-After`.
429	`concurrency_exceeded`	Another `real_execution` scenario is in flight in this space. Response includes `inflight_scenario_fingerprint` + `Retry-After`.
200 + `caveats`	`wall_budget_exceeded`	Partial result; score over partial observations.
500	`es_bulk_error`	log_injection ES bulk write failure.

Key refactors (latest iteration)

Factory-based per-family tools: createRunFamilyCommandTool replaces copy-pasted per-family tool files. One factory, parameterized by family name and command schema.
Reusable gate primitives: withCommandGates extracted from per-family tools — the REST route now reuses the same gate pipeline.
Typed error builder: EmulationToolError.from(code, message) replaces ad-hoc error construction.
Dead code removal: removed unused imports, legacy helpers, and orphaned test fixtures.
resolveCurrentUsername shared: moved to shared lib for reuse across tools and routes.
savedObjectsClient threading: threaded through gate checks instead of resolving ad-hoc per tool.
Telemetry collector query fix: uses bool.should with minimum_should_match: 1 to match alerts by kibana.alert.original_event.module: emulation OR emulation tag — ensures confidence > 0 for log_injection mode.
Skill referencedContent schema compliance: corrected from object to array format per referencedContentSchema in type_definition.ts.
Idempotency cache: tool dispatch gates cache recent calls to prevent double-fire on retried LLM tool invocations.

Demo plan

Enable both experimental flags in config/kibana.dev.yml (full snippet in DEMO.md).
Configure a restrictive allowlist: xpack.securitySolution.detectionEmulation.allowlist.endpointIds: [<your-pilot-host>].
Pick a rule whose MITRE technique intersects Wave-1 (e.g. T1059.001 Windows PowerShell).
From Agent Builder UI: "Validate detection rule <ruleId> against endpoint <endpointId> using log injection."
The framework prompts for HITL confirmation. Approve.
Inspect the ValidationReport: confidence, coverage, precision, per-phase breakdown, history SO id.
Inspect the response-action comment for via=agent-builder/conv:<id>/run:<id> actor attribution.
Try fanning out to 6+ endpoints — schema rejects with endpoint_fanout_exceeded.
Try a second real_execution while the first is in flight — second is rejected with concurrency_exceeded + Retry-After.
Follow up: "Show recent emulation runs for rule <ruleId>." → getEmulationHistory returns paginated history with the actor field populated.

Full walkthrough: x-pack/solutions/security/plugins/security_solution/server/lib/detection_emulation/DEMO.md.

Test plan

Out of scope

Beyond Wave-1 payloads. Hard-capped at 15 entries by design. Adding more is a follow-up after operator feedback on the Wave-1 set.
Cross-space concurrency. The gate is per-space. If real_execution should be globally serialized, that's a follow-up keyed on operator demand.
UI for emulation history. The getEmulationHistory tool exposes the SO via the Agent Builder; a dedicated stack-management UI is intentionally out of scope.
Live-cluster eval baseline. The kbn-evals suite ships with deterministic mocks; a live-cluster baseline run + result publication is a separate follow-up because it requires connector + cluster setup outside of CI.

cla-checker-service · 2026-05-13T03:58:10Z

❌ Author of the following commits did not sign a Contributor Agreement:
5638057, de325db, d119178, 6f86b85, cfc20c1

Please, read and sign the above mentioned agreement if you want to contribute to this project

infra-vault-gh-plugin-prod · 2026-05-13T03:58:32Z

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

Click to trigger kibana-pull-request for this PR!
Click to trigger kibana-deploy-project-from-pr for this PR!
Click to trigger kibana-deploy-cloud-from-pr for this PR!
Click to trigger kibana-entity-store-performance-from-pr for this PR!
Click to trigger kibana-storybooks-from-pr for this PR!

patrykkopycinski · 2026-05-13T11:12:51Z

/ci

kibanamachine · 2026-05-13T11:16:24Z

🤖 Prompt Changes Detected

Changes have been detected to one or more prompt files in the Elastic Assistant plugin.

Please remember to update the integrations repository with your prompt changes to ensure consistency across all deployments.

Next Steps:

Follow the documentation in x-pack/solutions/security/packages/security-ai-prompts/README.md to update the corresponding prompt files
Make the changes in the integrations repository
Test your changes in the integrations environment
Ensure prompt consistency across all deployments

This is an automated reminder to help maintain prompt consistency across repositories.

patrykkopycinski · 2026-05-13T11:26:02Z

/ci

patrykkopycinski · 2026-05-13T12:30:36Z

/ci

The post-cleanup CI build (b441998) on PR elastic#269019 surfaced four real breakages introduced by recent commits + the cleanup itself: 1. **TS typo** in validate_rule_tool.ts:515 — concurrencyResult exposes `inflightScenarioFingerprint`, not `inflightFingerprint`. The route file already uses the right name; only the tool path had drifted. 2. **TS typing** in run_command_tools.test.ts — the parameterized `it.each(tools)` table mixes per-family schemas with different `command` literal unions; destructuring `parameters` failed because one entry didn't have it, and `getConfirmation`'s `toolParams` intersected to `never` across the four families. Made `parameters` a uniform widened type and cast `getConfirmation` through `unknown` to a single shape — runtime contract unchanged, all 62 tests still pass. 3. **TS-projects linter** rejected validate_rule.spec.ts as a stranded file (excluded from the security_solution tsconfigs but not part of any other TS project). The canonical fix — matching every other eval suite in the repo (kbn-evals-suite-pci-compliance, etc.) — is to extract the spec + its dataset into a sibling devOnly `functional-tests` package: `@kbn/evals-suite-detection-emulation` under `x-pack/solutions/security/packages/`. This keeps the production plugin clean of the devOnly `@kbn/evals` reference and gives the spec a real owning tsconfig. CODEOWNERS updated to point at the new path. 4. **Moon project regen** — running `node scripts/regenerate_moon_projects.js --update` registered the new package in `package.json`, `tsconfig.base.json`, and `yarn.lock`. Auto-generated; included. Verification (local): - jest: 88/88 pass across run_command_tools.test.ts (62) + validate_rule_tool.test.ts (6) + concurrency_gate.test.ts (10) + validate_rule/route.test.ts (10). - eslint --fix: clean on all touched files. - type_check: clean on the new evals suite package (~70s) AND on the full security_solution plugin (~6m). No behavioural changes; this is purely a CI-repair commit.

patrykkopycinski · 2026-05-13T12:44:56Z

/ci

patrykkopycinski · 2026-05-13T12:49:56Z

/ci

patrykkopycinski · 2026-05-13T18:22:18Z

/ci

patrykkopycinski · 2026-05-13T18:52:08Z

/ci

patrykkopycinski · 2026-05-13T19:20:51Z

/ci

patrykkopycinski · 2026-05-13T22:02:48Z

/ci

patrykkopycinski · 2026-05-14T13:07:27Z

/ci

kibanamachine · 2026-05-14T13:16:58Z

💔 Build Failed

Buildkite Build
Commit: 23aee62

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	9408	9417	+9

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	12.1MB	12.1MB	+8.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	153.5KB	159.3KB	+5.8KB

Unknown metric groups

async chunk count

id	before	after	diff
`securitySolution`	113	112	-1

ESLint disabled in files

id	before	after	diff
`securitySolution`	106	107	+1

ESLint disabled line counts

id	before	after	diff
`securitySolution`	739	740	+1

Total ESLint disabled count

id	before	after	diff
`securitySolution`	845	847	+2

History

The post-cleanup CI build (b441998) on PR elastic#269019 surfaced four real breakages introduced by recent commits + the cleanup itself: 1. **TS typo** in validate_rule_tool.ts:515 — concurrencyResult exposes `inflightScenarioFingerprint`, not `inflightFingerprint`. The route file already uses the right name; only the tool path had drifted. 2. **TS typing** in run_command_tools.test.ts — the parameterized `it.each(tools)` table mixes per-family schemas with different `command` literal unions; destructuring `parameters` failed because one entry didn't have it, and `getConfirmation`'s `toolParams` intersected to `never` across the four families. Made `parameters` a uniform widened type and cast `getConfirmation` through `unknown` to a single shape — runtime contract unchanged, all 62 tests still pass. 3. **TS-projects linter** rejected validate_rule.spec.ts as a stranded file (excluded from the security_solution tsconfigs but not part of any other TS project). The canonical fix — matching every other eval suite in the repo (kbn-evals-suite-pci-compliance, etc.) — is to extract the spec + its dataset into a sibling devOnly `functional-tests` package: `@kbn/evals-suite-detection-emulation` under `x-pack/solutions/security/packages/`. This keeps the production plugin clean of the devOnly `@kbn/evals` reference and gives the spec a real owning tsconfig. CODEOWNERS updated to point at the new path. 4. **Moon project regen** — running `node scripts/regenerate_moon_projects.js --update` registered the new package in `package.json`, `tsconfig.base.json`, and `yarn.lock`. Auto-generated; included. Verification (local): - jest: 88/88 pass across run_command_tools.test.ts (62) + validate_rule_tool.test.ts (6) + concurrency_gate.test.ts (10) + validate_rule/route.test.ts (10). - eslint --fix: clean on all touched files. - type_check: clean on the new evals suite package (~70s) AND on the full security_solution plugin (~6m). No behavioural changes; this is purely a CI-repair commit.

Adds a detection emulation feature directly to the security_solution plugin. Users can run, approve, and visualise emulation commands against detection alerts without needing a separate plugin. What is added: - common/detection_emulation: Zod schema for the run command input - public/detections/components/emulation: - EmulationBadge — shows on alerts that carry an emulation id (kibana.alert.emulation.id) - EmulationFilter — toolbar filter on detection tables - RunEmulationModal — approval modal for a pending emulation command - server/agent_builder/skills/detection_emulation: - In-tree agent skill plus an inline run-command tool - server/lib/detection_emulation: - Rule binding saved object + alert tagging helpers - Feature flag, allowlist, audit logger, rate limiter, runner - REST route for executing emulation commands Wire-up: - Register `registerDetectionEmulationRoutes` from server/routes/index.ts - Register `emulationRuleBindingType` in server/saved_objects.ts - Register `getDetectionEmulationSkill` in agent_builder/skills/register_skills.ts (passes `core` + `config` threaded through from server/plugin.ts) - Re-export `defineSkillType` from `@kbn/agent-builder-server` for skill authors - Re-export `RunEmulationCommandInputSchema` from common/index.ts - Add `DETECTION_ENGINE_EMULATION_*` URL constants - Add CODEOWNERS entry for `server/lib/detection_emulation` UI integration: - additional_toolbar_controls.tsx: render <EmulationFilter> on detection tables - render_cell_value.tsx: render <EmulationBadge> for alerts that carry an emulation id - rule_details/index.tsx: render <RunEmulationModal> when an emulation approval is pending Tests: - Unit tests for the new skill and run-command tool - Unit tests for emulation badge / filter / modal - Integration test for the end-to-end emulation route + persistence Notes for reviewers: - All cross-package imports use canonical `@kbn/...` aliases. Intra-plugin imports remain relative per Kibana convention. - This PR contains only production-ready changes.

Applies the full review pass against the in-tree detection emulation feature added in 9f9f073 — closes the blocker, important, and nice-to-have findings surfaced during review. Server / route - B1/B3/N5/N6/N7: route enforces `experimentalFeatures .detectionEmulationRealExecution`, swaps the rate limiter to atomic acquire/release (release on dispatch failure), refuses to dispatch destructive actions without an authenticated caller (401 instead of falling back to `username='unknown'`), short-circuits double-submits via an in-memory idempotency cache keyed on (space, emulation, command, agentType, sorted endpointIds), and wires allowlist / rate-limiter / idempotency-cache config from `xpack.securitySolution.detectionEmulation.*`. - I1/I3/I4: replaces legacy `tags: ['access:securitySolution']` with declarative `security.authz.requiredPrivileges`, stops echoing internal error messages to clients, and wraps every user-facing string in `i18n.translate`. - I2/N4: introduces a typed runner error taxonomy (`UnsupportedAgentTypeError`, `UnsupportedCommandForAgentTypeError`, `MissingConnectorActionsError`) in its own module so the route can map cleanly to 4xx/5xx, and adds an exhaustiveness check on the dispatch switch. - I5/I7: marks the `emulation-rule-binding` SO `hidden: true` / `hiddenFromHttpApis: true` and adds a `modelVersions` baseline; the runner now accepts a `ruleBindingLookup` so dispatched actions carry ruleId / ruleName via a new `createSavedObjectRuleBindingLookup` helper that uses the internal SO client. Schema / contract - I6: rewrites `RunEmulationCommandInputSchema` as a discriminated union on `command` with strict, command-specific `parameters` shapes — closes the silent-passthrough hole where typos like `entityId` used to sail through the previous `z.record`. Models `kill-process` / `suspend-process` as a `pid` xor `entity_id` union and `memory-dump` as a `kernel | process(pid|entity_id)` union (z.union, since v4 forbids duplicate discriminator values). Skill / agent-builder - B5/I16: rewrites the skill content to match the actually-registered tool and command list, and gates skill registration on `detectionEmulationRealExecution`. - Tool now returns `BuiltinSkillBoundedTool` (not `BuiltinToolDefinition`) so it satisfies the framework's `SkillBoundedTool` contract; basePath moved to the canonical `skills/security/endpoint`. UI - B4: removes the dead `RunEmulationModal` block from `rule_details/index.tsx`. - I9: `EmulationFilter` subscribes to `filterManager.getUpdates$()` so the toggle stays in sync when filters are mutated elsewhere. - I10/I11/I12: `RunEmulationModal` resets local state on `requestId`/`suggestion` change, disables Approve/Reject after click via `isSubmitting`, and parses modified args with a shell-style tokenizer instead of splitting on whitespace. - I13: replaces the `@elastic/eui/src/...` deep import with a type derived from EUI's published `onChange` prop signature. - I14: wraps `EmulationBadge` in `EuiToolTip` for keyboard / screen-reader users. Cleanup - I8/I17/N1/N2: deletes the unused `logInjection` flag, prunes dead `audit_logger` helpers, and slims the allowlist / rate-limiter APIs to the methods actually used. - I15: adds CODEOWNERS entries for `common/detection_emulation`, `public/detections/components/emulation`, and the `agent_builder/skills/detection_emulation` directory. Tests - N8: `EmulationBadge` test asserts via `data-test-subj`, not classNames. - Route, schema, skill, and component tests updated to cover the new gates (auth, idempotency, rate limit), the typed errors, the discriminated union, and the new tooltip / tokenizer behavior. - Full pre-commit pass: `type_check.js` clean, eslint clean on all changed files, 123/123 jest specs green across schema, server, agent-builder skill, and component suites.

Adds payloads/payloads.json with 12 entries covering ATT&CK techniques: T1059.001, T1059.003, T1059.004, T1218.005, T1218.011, T1053.005, T1547.001, T1057, T1003.001, T1070.004, T1071.001, T1112. Each entry is typed { techniqueId, name, agentTypes[], command, parameters, expectedSignals[] }. Payloads use self-cleaning shell commands where possible to minimise post-emulation artifacts. T1057 (process discovery) uses `running-processes` and lists all 4 supported agent types; all other entries use `execute` and are scoped to `endpoint` which is the only agent type with execute support wired today. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds payloads/index.ts exporting: - EmulationPayload interface typed against ResponseActionAgentType and ResponseActionsApiCommandNames (import type — no runtime coupling to common constants). - payloadLibrary: readonly EmulationPayload[] loaded from payloads.json. - PAYLOAD_LIBRARY_MAX_ENTRIES = 15 governance constant. - findByTechniqueIds(ids): uses a Set for O(1) lookups; preserves library insertion order; deduplicates repeated IDs in the input. Adds payloads/index.test.ts with 26 jest assertions covering: - Hard-cap enforcement (toBeLessThanOrEqual PAYLOAD_LIBRARY_MAX_ENTRIES). - Shape validation: non-empty techniqueId/name, valid agentTypes, valid commands, at least one expectedSignal, unique techniqueIds. - Wave-1 technique coverage (it.each over all 12 required IDs). - findByTechniqueIds edge cases: empty input, no-match, single match, multi-match, unknown+known mix, order preservation, deduplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…type Adds emulation_report_type.ts with: - emulationReportType (SavedObjectsType<EmulationReportAttributes>): hidden=true, hiddenFromHttpApis=true, namespaceType=multiple-isolated, stored in SECURITY_SOLUTION_SAVED_OBJECT_INDEX. - EmulationReportAttributes interface covering all 14 fields from the spec: scenarioId, ruleId, scenarioFingerprint, mode, endpointIds, agentType, startedAt, completedAt, payloadIds, dispatchedActions[], score, perPhase[], operator, spaceId. - modelVersions baseline '1' with forwardCompatibility + create schemas (unknowns: 'ignore' on forward compat to allow future additive fields). - ES mappings: dynamic: false; score fields use float/integer; array fields (endpointIds, payloadIds, signals) mapped as keyword multi-value. Wires emulationReportType into saved_objects.ts types[] and exports it from lib/detection_emulation/index.ts alongside the existing emulationRuleBindingType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds detectionEmulationLogInjection: false alongside the existing detectionEmulationRealExecution flag. When true, the validateRule pipeline uses log injection (synthesised ECS documents) instead of dispatching real response actions to endpoints. Gating on a separate flag keeps the two dispatch modes independently toggleable and lets log injection ship before real execution is broadly available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n and validation keys Adds two new optional sub-objects under xpack.securitySolution.detectionEmulation: detectionEmulation.logInjection: indexTemplateName (default: '.kibana-security-emulation-logs') Base name for the ILM-managed index template; runtime appends '<spaceId>-*' to form the full pattern. retentionDays (default: 7, min: 1) ILM delete phase for synthesised ECS documents. detectionEmulation.validation: wallBudgetMsDefault (default: 60 000 ms, min: 1 000) Default telemetry-collector timeout per validateRule run. wallBudgetMsMax (default: 300 000 ms, min: 1 000) Hard ceiling for budget values accepted from API callers; requests above this are clamped, preventing runaway long-poll connections. Both sub-objects follow the existing schema.maybe(schema.object({...})) pattern used by allowlist/rateLimiter/idempotencyCache — the whole group is optional, code null-coalesces to baked-in defaults. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…: Update smoke spec findings shape to match spec: canRead + indexCount Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… notification path for risk #3 in README Adds a new section covering the audit SO fields (actor.kind, scenarioFingerprint SHA-256), Kibana security audit log integration, and three SOC tooling consumption patterns (Kibana rule, Watcher, Filebeat/Fleet pipeline). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…de role definition guidance for risk elastic#16 Documents how to run the discovery probe, the expected access surface for built-in ES roles (superuser + kibana_system as known residuals), and four least-privilege mitigations for operator-defined roles: no wildcard .kibana* grants, CCS pattern splitting, DLS match_none filter, and ES-layer audit logging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…date risk HTML: hero counts, rows #3-elastic#16, safeguards - Hero: Medium 2→1 (row #9 demoted to Low by event.dataset stamp); safeguards 21→23 - Row #3: mitigated note references OOB notification path documented in README - Row #4: permanent note updated — execute curatedOnly + allowedExecuteCommandPatterns - Row #9: Medium/Scheduled → Low/Mitigated (event.dataset + event.module stamp) - Row elastic#14: mitigated note updated — curatedOnly now covers upload (closed short-circuit) - Row elastic#16: Low/Scheduled → Low/Mitigated (discovery probe + README operator guidance) - Active safeguards: updated curatedOnly bullet (execute+upload), added execute-regex gate bullet, added event.dataset stamp bullet; intro updated to "Twenty-three controls" - Roadmap: #9 and elastic#16 rows updated to note shipped vs follow-up split Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ixes to changed detection_emulation files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nst undefined Partial configs (e.g. from older tests or forward-compat reads) omit the new field; use `?? []` so the length check never throws on undefined. The required interface still enforces the field at the TS layer for new call-sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e inspection in index_access smoke spec Rewrites the index_access smoke spec (Risk elastic#16) to use `_security/role` definition inspection instead of temporary user creation + privilege checks. The original run_as / per-role-client approach requires the cluster to have a master node for write quorum (putUser is a write operation). Role definition inspection is fully read-only and works on any cluster state. Results match the expected access surface documented in the README: superuser + kibana_system have read access; all other built-in roles do not. Other improvements: - SerializeError helper includes HTTP status code for non-200 ES responses - `create_index` field renamed to `createIndex` (camelCase, naming-convention) - `fleet_server` 404 surfaces as "404: {}" for clarity (role absent in ES 9.5) - Test runs in <200ms instead of timing out at 120s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Trim validateRule tool description from ~1800 to ~400 tokens for OSS model compatibility (pipeline details already live in skill content) - Shorten endpointIds .describe() to 2 concise sentences - Align skill content tool references to use actual registered IDs (security.detection-emulation.validate-rule, .get-history, .run-process-command, etc.) instead of informal shorthand names - Change agentType from z.literal('endpoint') to z.enum(['endpoint']) across all 5 tool schemas for smoother future vendor extension - Add 3 additional distractor examples to the eval dataset (ES|QL question, threat hunting request, dashboard creation) bringing total distractors to 5 per skill-dev-plugin guidance

Centralizes ~20 manually-constructed error responses into a single `emulation_tool_errors.ts` module with type-safe factory methods for each error class (featureDisabled, authorizationError, rateLimitExceeded, invalidParameters, userDeclined, validationGateBlocked, scenarioFailure, concurrencyExceeded, executionError, etc.). Updates withCommandGates, validateRule, and all 4 per-family run*Command tools to use the shared builder instead of inline ToolResultType.error constructions. Removes redundant ToolResultType imports from per-family tools.

The REST route's idempotency cache prevented double-dispatch from network retries, but the Agent Builder tool dispatch path (via withCommandGates) lacked this protection. LLM retries or framework-level transient-error retries could fire a second response action. Threads the idempotencyCache from DetectionEmulationGuardrails into all 4 per-family tools → withCommandGates context. Checks the cache after the allowlist gate (matching REST ordering) and writes back on both success and error paths so replays get the cached result.

Creates gate_checks.ts with protocol-agnostic gate functions: - checkRealExecutionFeatureFlags / checkModeFeatureFlags - checkValidation (curated-only + allowedScriptIds) - checkRbac (per-command RBAC via EndpointAuthz) - resolveEffectiveConfig (reads Advanced Settings per-space) - checkAllowlist (host allowlist) - acquireRateLimit (atomic per-space + per-host rate limit) - checkAuth (authenticated caller check) Each gate returns a typed GateResult<T> (ok/fail) with structured metadata. withCommandGates now composes from these primitives instead of inline logic — single source of truth for each gate check that both the tool dispatch and REST route can share.

Refactors the run_command REST route to import and call the shared gate check functions (checkRealExecutionFeatureFlags, checkAllowlistGate, acquireRateLimitGate) instead of duplicating the logic inline. The route still handles protocol translation (GateResult → HTTP response via siemResponse) and route-specific concerns (i18n messages, Kibana request context), but the gate logic itself is now single-sourced from gate_checks.ts.

Creates createRunFamilyCommandTool factory that builds the schema, confirmation, and handler from a FamilyToolConfig object. All four per-family tools (process, file, network, execution) are now config-only modules (~50 lines each) delegating to the factory. Eliminates ~400 lines of duplicated handler/schema/confirmation logic. Adding a new family (e.g. registry) is now a one-file, config-only addition.

Adds optional savedObjectsClient to CommandGatesContext and the factory handler destructure. When provided by the Agent Builder handler context, withCommandGates uses it directly for uiSettingsClient derivation rather than re-creating a scoped client via coreStart.savedObjects. This eliminates a redundant getScopedClient call (async hop) on every tool invocation while keeping backward compat (falls back to request-scoped creation when the field is absent).

Relocates resolve_current_user.ts from the skill-specific directory to server/lib/detection_emulation/ so it's importable from both the Agent Builder tool handlers and the REST routes without a cross-concern import path. Previously flagged in the code itself as "should be upstreamed" — this is the short-term path (shared within the plugin) while awaiting an export from @kbn/agent-builder-server.

- Remove unused DetectionEmulationFeatureFlags type import from gate_checks.ts - Migrate get_emulation_history_tool.ts to use the shared toolError builder instead of inline ToolResultType.error construction

… Workflows stack - Wire shared gate_checks into validate_rule_tool.ts (replaces ~80 lines of inline gates) - Delete run_command REST route + tests (-1,108 lines) — tool is the single implementation - Add traced logger (createTracedLogger) to createRunFamilyCommandTool factory - Wrap async operations in withCommandGates with runStep for timing/error attribution - Remove DETECTION_ENGINE_EMULATION_RUN_COMMAND_URL constant - Add execution modules: traced_logger, pipeline_step_error, tool_factory_deps, validate_pre_execution Patterns adopted from PRs elastic#260739, elastic#260744, elastic#260793, elastic#260811.

- Remove DEMO_GUIDE.md, openspec/, .playwright-mcp/ (not part of this PR) - Remove dead functions: buildEmulationModeQuery, extractEmulationMetadata, isEmulationAlert - Clean up corresponding test cases and unused imports

- Fix runtime crash: featureFlags undefined on log_injection path - Move gate_checks.ts from agent_builder/skills/ to lib/execution/ (fixes circular dependency: lib/ was importing from agent_builder/) - Remove `as any` casts for coreStart.security (unnecessary) - Remove empty ValidateRuleToolDeps interface (use ToolFactoryDeps) - Remove DEMO.md files and production-risk-analysis.html

…on Agent Builder skill Three evaluators per example: - toolSelection (createSkillInvocationEvaluator): APM trace check for SKILL.md filestore.read span — verifies skill activation - schemaCompliance (createTraceBasedEvaluator): ES|QL query over traces asserting every validate-rule call includes ruleId + endpointIds - criteria (DefaultEvaluators.criteria): per-example LLM judge 9 examples: 2 success paths (T1059.001, T1218.005), default mode, history-first flow, 3 failure modes, 2 distractors. Includes .eslintrc.js boundary-crossing exemption so *.spec.ts can import the devOnly @kbn/evals package without affecting the plugin build.

…rver/agent_builder/skills/detection_emulation/evals/validate_r Auto-committed by patryks-treadmill orchestrator. plan=detection-emulation-skill-epic-15974-orchestration-layer job=f23206c3-0fe4-461c-b545-e2f737b7f735 attempt=1

… orchestrator - Extract runRuleExecutors from route.ts (already committed) - Clean up unused imports in route.ts (SERVER_APP_ID, RuleExecutionStatusEnum, alertInstanceFactoryStub) - Add optional rulePreviewDeps to OrchestratorOptions for rule preview validation - Add Step 8 (Rule Preview Validation) to scenario orchestrator pipeline - Add rulePreviewValidation to OrchestratorResult - Update DESIGN.md with Section 9: Implementation Status & Gap Analysis

…tation

…e demo screenshot showing Agent Builder transcript with succes Auto-committed by patryks-treadmill orchestrator. plan=detection-emulation-skill-epic-15974-orchestration-layer job=49341029-00d6-4a17-af8c-300aa7d3f419 attempt=2

Brings the plugin-side eval files into parity with the canonical kbn-evals-suite-detection-emulation suite: Dataset (validate_rule_dataset.ts): - Add `tool_sequence` field to all examples (consumed by trajectory evaluator for LCS-based order scoring; `[]` for distractor examples so the evaluator returns 1.0 when no tools fired) - Add `autoConfirm` field to the HITL `userDeclines` example - Add HITL example: user declines real_execution prompt → `user_declined` - Add 3 extra distractor examples: ES|QL question, threat hunting, dashboard creation (total 13 examples, up from 9) Spec (validate_rule.spec.ts): - Replace `p-retry` with `withRetry` from @kbn/evals (consistent with canonical suite; N5 tracking) - Add HITL auto-resume loop in DetectionEmulationChatClient.converse: polls `response.prompts`, responds with the per-example `autoConfirm` policy, bounded by MAX_PROMPT_ROUNDS=5 - Add `createValidateRuleTrajectoryEvaluator` (createTrajectoryEvaluator with orderWeight=0.7 / coverageWeight=0.3) applied to every example - Wire `autoConfirm` policy from `example.input.autoConfirm` into each `runScenario` call - Register the 4 new evaluate() blocks matching the 4 new examples Three required evaluators per spec §8 remain: toolSelection (renamed from createToolSelectionEvaluator → same createSkillInvocationEvaluator underneath), schemaCompliance, criteria. Trajectory is additive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…skill Adds DEMO.md alongside the existing README.md covering: - Feature flag setup in kibana.dev.yml (detectionEmulationLogInjection, detectionEmulationRealExecution) and full optional runtime config keys - Step-by-step Agent Builder UI walkthrough (happy path, history-first, real execution, and failure cases) - Full ValidationReport response field reference with inline comments - Typed error response table (error_type, HTTP equivalent, trigger) - Dev Tools queries for inspecting injected log-injection documents - Saved Objects Find API snippet for browsing emulation history - Troubleshooting table for common failure modes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements full ES|QL query inversion using @elastic/esql AST parser. Extracts field constraints from WHERE clauses to generate matching synthetic log documents. Supported operators: ==, !=, >, >=, <, <=, LIKE, RLIKE, IN, IS NULL, IS NOT NULL, AND, OR, NOT. For aggregating queries (STATS ... | WHERE threshold), only the first WHERE clause is inverted — threshold filters operate on aggregation results, not document fields. - New esql_inverter.ts module with extractEsqlConstraints() - Wired into query_inverter.ts dispatcher (language === 'esql') - 26 unit tests covering all operators, edge cases, and real-world patterns - Lucene remains as the only graceful-degradation language

patrykkopycinski · 2026-05-28T21:59:31Z

Observation: `EmulationRunner` + `withCommandGates` duplicate `ResponseActionsClient` capabilities

While reviewing the shared infrastructure for the upcoming Endpoint Response Actions Skill (#17508), I noticed the emulation skill rebuilds several capabilities that BaseResponseActionsClient already provides on main. Consolidating would reduce ~300 lines and prevent the two paths from diverging.

What `BaseResponseActionsClient` already handles

Capability	Location	What it does
RBAC (per-command)	`validateRequest()` → `isActionSupportedByAgentType()`	Checks command is supported for agent type + action mode (manual/automated)
Enterprise license gate	`validateRequest()` → `getLicenseService().isEnterprise()`	Blocks automated actions without Enterprise license
Space-scoped agent validation	`validateRequest()` → `fetchAgentPolicyInfo()`	Validates agents are in the active space
Audit trail	`writeActionRequestToEndpointIndex()`	Writes to `.logs-endpoint.actions-*` with full attribution
Cases attachment	`updateCases()`	Attaches action to Security cases
Telemetry	`sendActionSentTelemetry()` / `sendActionResponseTelemetry()`	Reports `ENDPOINT_RESPONSE_ACTION_SENT_EVENT`
Action expiration	`getActionRequestExpiration()`	Sets TTL on action requests
Dispatch (all 12 commands)	`EndpointActionsClient.isolate()`, `.execute()`, etc.	Typed dispatch via Fleet actions API

What the emulation skill re-implements

Gate in `withCommandGates` / `gate_checks.ts`	Overlap with client
`checkRbac()` — maps command → `RESPONSE_CONSOLE_ACTION_COMMANDS_TO_REQUIRED_AUTHZ` → `endpointAuthz[key]`	`validateRequest()` already checks `isActionSupportedByAgentType(agentType, command, actionType)`. The emulation RBAC gate adds a finer-grained check (per-console-command privilege), which the base client doesn't do — this is a genuine addition
`checkAuth()` — resolves username via `_security/_authenticate`	The client constructor takes `username` as a required param — the caller already resolved it. The emulation skill re-resolves because of the Task Manager `fakeRequest` issue, but for interactive Agent Builder calls the request already carries the user
`EmulationRunner.dispatch()` — exhaustive switch over all 12 command types	`EndpointActionsClient` has the identical switch — `.isolate()`, `.execute()`, `.killProcess()`, etc. The runner is a pass-through wrapper
`EmulationRunner.createResponseActionsClient()` — `getResponseActionsClient(agentType, opts)`	One-liner factory call, same as what any consumer does

What's genuinely emulation-specific (should stay)

These are NOT in the base client and rightfully belong in the emulation layer:

EmulationAllowlist — operator-controlled host allowlist (Advanced Settings)
EmulationRateLimiter — per-space (100/h) + per-host (3/h) sliding windows
EmulationIdempotencyCache — dedup retried LLM tool calls
checkRealExecutionFeatureFlags() — emulation-specific feature flag + runtime kill switch
checkValidation() — curated-only mode, allowedExecuteCommandPatterns regex allowlist, allowedScriptIds
buildEmulationComment() — audit attribution with conversationId/runId/toolCallId/SHA-256 prompt hash
EmulationRunner.resolveRuleBinding() — rule context lookup for emulation actions

Suggested simplification

The EmulationRunner class could be reduced to a thin wrapper that:

Resolves the rule binding (emulation-specific)
Builds the emulation comment with actor attribution (emulation-specific)
Calls client.isolate() / client.execute() / etc. directly — no dispatch switch needed

The exhaustive dispatch() switch duplicates the typed interface that ResponseActionsClient already enforces. If a new command is added, it needs to be added in both places today — single point of truth would be better.

// Before: EmulationRunner.dispatch() has 12-case switch
// After: direct client call
const client = getResponseActionsClient('endpoint', constructorOptions);
const actionDetails = await client[commandMethodMap[input.command]](request, options);

The withCommandGates pipeline is valuable — the emulation-specific gates (allowlist, rate limiter, idempotency, feature flags, validation) are genuine additions. But the RBAC check + auth check + dispatch could lean on the client directly rather than re-implementing them.

This isn't blocking — the current implementation works and is well-tested. But when the Response Actions Skill (#17508) ships, it will use ResponseActionsClient directly (no runner, no custom dispatch switch). Having two patterns for the same underlying dispatch in the same codebase will cause confusion about which one to use for future response-action surfaces.

Context: this came up while planning the shared infrastructure between the detection emulation skill and the upcoming endpoint response actions skill. The response actions skill will be ~50 lines of tool handler code because it delegates everything to ResponseActionsClient + getActionDetailsById() directly.

patrykkopycinski force-pushed the ao/detection-emulation-skill-4de85a branch from e365164 to 065b6ed Compare May 13, 2026 09:10

patrykkopycinski force-pushed the ao/detection-emulation-skill-4de85a branch from 6747034 to 44329de Compare May 13, 2026 11:25

patrykkopycinski changed the title ~~[Security Solution] Detection Emulation Skill (Epic #15974) — substrate + orchestration layer~~ [Security Solution] Detection Emulation Skill (Epic #15974) — substrate + orchestration + production-opt-in readiness pack May 13, 2026

patrykkopycinski force-pushed the ao/detection-emulation-skill-4de85a branch from 9e4b13f to 2277d28 Compare May 15, 2026 08:14

patrykkopycinski force-pushed the ao/detection-emulation-skill-4de85a branch from 2277d28 to f0df0dd Compare May 25, 2026 18:08

patrykkopycinski and others added 7 commits May 27, 2026 11:56

patrykkopycinski and others added 17 commits May 27, 2026 11:56

ao(update-smoke-spec-findings-shape-to-match-spec-canread-indexcount)…

c8d8214

…: Update smoke spec findings shape to match spec: canRead + indexCount Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ao(eslint-fix-detection-emulation-changed-files): Apply ESLint auto-f…

e61e539

…ixes to changed detection_emulation files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(detection-emulation): remove dead code

f2a8bc7

- Remove unused DetectionEmulationFeatureFlags type import from gate_checks.ts - Migrate get_emulation_history_tool.ts to use the shared toolError builder instead of inline ToolResultType.error construction

patrykkopycinski force-pushed the ao/detection-emulation-skill-4de85a branch from 2044677 to f8dd99c Compare May 27, 2026 19:58

patrykkopycinski and others added 11 commits May 27, 2026 22:07

chore(detection-emulation): remove unrelated files and dead code

c117b94

- Remove DEMO_GUIDE.md, openspec/, .playwright-mcp/ (not part of this PR) - Remove dead functions: buildEmulationModeQuery, extractEmulationMetadata, isEmulationAlert - Clean up corresponding test cases and unused imports

fix(detection-emulation): update test error message to match implemen…

a434e27

…tation

docs: update DESIGN.md — ES|QL inverter gap closed, test counts updated

2b9bc73

Conversation

patrykkopycinski commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

What's new

Pipeline (route execution order)

Failure-mode coverage

Key refactors (latest iteration)

Demo plan

Test plan

Out of scope

Uh oh!

cla-checker-service Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

infra-vault-gh-plugin-prod Bot commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

kibanamachine commented May 13, 2026

🤖 Prompt Changes Detected

Next Steps:

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 13, 2026

Uh oh!

patrykkopycinski commented May 14, 2026

Uh oh!

kibanamachine commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

Metrics [docs]

Module Count

Async chunks

Page load bundle

async chunk count

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

History

Uh oh!

patrykkopycinski commented May 28, 2026

Observation: EmulationRunner + withCommandGates duplicate ResponseActionsClient capabilities

What BaseResponseActionsClient already handles

What the emulation skill re-implements

What's genuinely emulation-specific (should stay)

Suggested simplification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrykkopycinski commented May 13, 2026 •

edited

Loading

cla-checker-service Bot commented May 13, 2026 •

edited

Loading

kibanamachine commented May 14, 2026 •

edited

Loading

Observation: `EmulationRunner` + `withCommandGates` duplicate `ResponseActionsClient` capabilities

What `BaseResponseActionsClient` already handles