[Spike][Security Solution] Detection Emulation Skill (Epic #15974) — substrate + orchestration + production-opt-in readiness pack#269019
Conversation
|
🤖 Jobs for this PR can be triggered through checkboxes. 🚧
ℹ️ To trigger the CI, please tick the checkbox below 👇
|
e365164 to
065b6ed
Compare
|
/ci |
🤖 Prompt Changes DetectedChanges have been detected to one or more prompt files in the Elastic Assistant plugin. Please remember to update the integrations repository with your prompt changes to ensure consistency across all deployments. Next Steps:
This is an automated reminder to help maintain prompt consistency across repositories. |
6747034 to
44329de
Compare
|
/ci |
|
/ci |
The post-cleanup CI build (b441998) on PR elastic#269019 surfaced four real breakages introduced by recent commits + the cleanup itself: 1. **TS typo** in validate_rule_tool.ts:515 — concurrencyResult exposes `inflightScenarioFingerprint`, not `inflightFingerprint`. The route file already uses the right name; only the tool path had drifted. 2. **TS typing** in run_command_tools.test.ts — the parameterized `it.each(tools)` table mixes per-family schemas with different `command` literal unions; destructuring `parameters` failed because one entry didn't have it, and `getConfirmation`'s `toolParams` intersected to `never` across the four families. Made `parameters` a uniform widened type and cast `getConfirmation` through `unknown` to a single shape — runtime contract unchanged, all 62 tests still pass. 3. **TS-projects linter** rejected validate_rule.spec.ts as a stranded file (excluded from the security_solution tsconfigs but not part of any other TS project). The canonical fix — matching every other eval suite in the repo (kbn-evals-suite-pci-compliance, etc.) — is to extract the spec + its dataset into a sibling devOnly `functional-tests` package: `@kbn/evals-suite-detection-emulation` under `x-pack/solutions/security/packages/`. This keeps the production plugin clean of the devOnly `@kbn/evals` reference and gives the spec a real owning tsconfig. CODEOWNERS updated to point at the new path. 4. **Moon project regen** — running `node scripts/regenerate_moon_projects.js --update` registered the new package in `package.json`, `tsconfig.base.json`, and `yarn.lock`. Auto-generated; included. Verification (local): - jest: 88/88 pass across run_command_tools.test.ts (62) + validate_rule_tool.test.ts (6) + concurrency_gate.test.ts (10) + validate_rule/route.test.ts (10). - eslint --fix: clean on all touched files. - type_check: clean on the new evals suite package (~70s) AND on the full security_solution plugin (~6m). No behavioural changes; this is purely a CI-repair commit.
|
/ci |
6 similar comments
|
/ci |
|
/ci |
|
/ci |
|
/ci |
|
/ci |
|
/ci |
💔 Build Failed
Failed CI StepsMetrics [docs]Module Count
Async chunks
Page load bundle
Unknown metric groupsasync chunk count
ESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
|
The post-cleanup CI build (b441998) on PR elastic#269019 surfaced four real breakages introduced by recent commits + the cleanup itself: 1. **TS typo** in validate_rule_tool.ts:515 — concurrencyResult exposes `inflightScenarioFingerprint`, not `inflightFingerprint`. The route file already uses the right name; only the tool path had drifted. 2. **TS typing** in run_command_tools.test.ts — the parameterized `it.each(tools)` table mixes per-family schemas with different `command` literal unions; destructuring `parameters` failed because one entry didn't have it, and `getConfirmation`'s `toolParams` intersected to `never` across the four families. Made `parameters` a uniform widened type and cast `getConfirmation` through `unknown` to a single shape — runtime contract unchanged, all 62 tests still pass. 3. **TS-projects linter** rejected validate_rule.spec.ts as a stranded file (excluded from the security_solution tsconfigs but not part of any other TS project). The canonical fix — matching every other eval suite in the repo (kbn-evals-suite-pci-compliance, etc.) — is to extract the spec + its dataset into a sibling devOnly `functional-tests` package: `@kbn/evals-suite-detection-emulation` under `x-pack/solutions/security/packages/`. This keeps the production plugin clean of the devOnly `@kbn/evals` reference and gives the spec a real owning tsconfig. CODEOWNERS updated to point at the new path. 4. **Moon project regen** — running `node scripts/regenerate_moon_projects.js --update` registered the new package in `package.json`, `tsconfig.base.json`, and `yarn.lock`. Auto-generated; included. Verification (local): - jest: 88/88 pass across run_command_tools.test.ts (62) + validate_rule_tool.test.ts (6) + concurrency_gate.test.ts (10) + validate_rule/route.test.ts (10). - eslint --fix: clean on all touched files. - type_check: clean on the new evals suite package (~70s) AND on the full security_solution plugin (~6m). No behavioural changes; this is purely a CI-repair commit.
9e4b13f to
2277d28
Compare
2277d28 to
f0df0dd
Compare
The post-cleanup CI build (b441998) on PR elastic#269019 surfaced four real breakages introduced by recent commits + the cleanup itself: 1. **TS typo** in validate_rule_tool.ts:515 — concurrencyResult exposes `inflightScenarioFingerprint`, not `inflightFingerprint`. The route file already uses the right name; only the tool path had drifted. 2. **TS typing** in run_command_tools.test.ts — the parameterized `it.each(tools)` table mixes per-family schemas with different `command` literal unions; destructuring `parameters` failed because one entry didn't have it, and `getConfirmation`'s `toolParams` intersected to `never` across the four families. Made `parameters` a uniform widened type and cast `getConfirmation` through `unknown` to a single shape — runtime contract unchanged, all 62 tests still pass. 3. **TS-projects linter** rejected validate_rule.spec.ts as a stranded file (excluded from the security_solution tsconfigs but not part of any other TS project). The canonical fix — matching every other eval suite in the repo (kbn-evals-suite-pci-compliance, etc.) — is to extract the spec + its dataset into a sibling devOnly `functional-tests` package: `@kbn/evals-suite-detection-emulation` under `x-pack/solutions/security/packages/`. This keeps the production plugin clean of the devOnly `@kbn/evals` reference and gives the spec a real owning tsconfig. CODEOWNERS updated to point at the new path. 4. **Moon project regen** — running `node scripts/regenerate_moon_projects.js --update` registered the new package in `package.json`, `tsconfig.base.json`, and `yarn.lock`. Auto-generated; included. Verification (local): - jest: 88/88 pass across run_command_tools.test.ts (62) + validate_rule_tool.test.ts (6) + concurrency_gate.test.ts (10) + validate_rule/route.test.ts (10). - eslint --fix: clean on all touched files. - type_check: clean on the new evals suite package (~70s) AND on the full security_solution plugin (~6m). No behavioural changes; this is purely a CI-repair commit.
Adds a detection emulation feature directly to the security_solution
plugin. Users can run, approve, and visualise emulation commands against
detection alerts without needing a separate plugin.
What is added:
- common/detection_emulation: Zod schema for the run command input
- public/detections/components/emulation:
- EmulationBadge — shows on alerts that carry an emulation id
(kibana.alert.emulation.id)
- EmulationFilter — toolbar filter on detection tables
- RunEmulationModal — approval modal for a pending emulation command
- server/agent_builder/skills/detection_emulation:
- In-tree agent skill plus an inline run-command tool
- server/lib/detection_emulation:
- Rule binding saved object + alert tagging helpers
- Feature flag, allowlist, audit logger, rate limiter, runner
- REST route for executing emulation commands
Wire-up:
- Register `registerDetectionEmulationRoutes` from server/routes/index.ts
- Register `emulationRuleBindingType` in server/saved_objects.ts
- Register `getDetectionEmulationSkill` in
agent_builder/skills/register_skills.ts (passes `core` + `config`
threaded through from server/plugin.ts)
- Re-export `defineSkillType` from `@kbn/agent-builder-server` for skill
authors
- Re-export `RunEmulationCommandInputSchema` from common/index.ts
- Add `DETECTION_ENGINE_EMULATION_*` URL constants
- Add CODEOWNERS entry for `server/lib/detection_emulation`
UI integration:
- additional_toolbar_controls.tsx: render <EmulationFilter> on detection
tables
- render_cell_value.tsx: render <EmulationBadge> for alerts that carry an
emulation id
- rule_details/index.tsx: render <RunEmulationModal> when an emulation
approval is pending
Tests:
- Unit tests for the new skill and run-command tool
- Unit tests for emulation badge / filter / modal
- Integration test for the end-to-end emulation route + persistence
Notes for reviewers:
- All cross-package imports use canonical `@kbn/...` aliases. Intra-plugin
imports remain relative per Kibana convention.
- This PR contains only production-ready changes.
Applies the full review pass against the in-tree detection emulation feature added in 9f9f073 — closes the blocker, important, and nice-to-have findings surfaced during review. Server / route - B1/B3/N5/N6/N7: route enforces `experimentalFeatures .detectionEmulationRealExecution`, swaps the rate limiter to atomic acquire/release (release on dispatch failure), refuses to dispatch destructive actions without an authenticated caller (401 instead of falling back to `username='unknown'`), short-circuits double-submits via an in-memory idempotency cache keyed on (space, emulation, command, agentType, sorted endpointIds), and wires allowlist / rate-limiter / idempotency-cache config from `xpack.securitySolution.detectionEmulation.*`. - I1/I3/I4: replaces legacy `tags: ['access:securitySolution']` with declarative `security.authz.requiredPrivileges`, stops echoing internal error messages to clients, and wraps every user-facing string in `i18n.translate`. - I2/N4: introduces a typed runner error taxonomy (`UnsupportedAgentTypeError`, `UnsupportedCommandForAgentTypeError`, `MissingConnectorActionsError`) in its own module so the route can map cleanly to 4xx/5xx, and adds an exhaustiveness check on the dispatch switch. - I5/I7: marks the `emulation-rule-binding` SO `hidden: true` / `hiddenFromHttpApis: true` and adds a `modelVersions` baseline; the runner now accepts a `ruleBindingLookup` so dispatched actions carry ruleId / ruleName via a new `createSavedObjectRuleBindingLookup` helper that uses the internal SO client. Schema / contract - I6: rewrites `RunEmulationCommandInputSchema` as a discriminated union on `command` with strict, command-specific `parameters` shapes — closes the silent-passthrough hole where typos like `entityId` used to sail through the previous `z.record`. Models `kill-process` / `suspend-process` as a `pid` xor `entity_id` union and `memory-dump` as a `kernel | process(pid|entity_id)` union (z.union, since v4 forbids duplicate discriminator values). Skill / agent-builder - B5/I16: rewrites the skill content to match the actually-registered tool and command list, and gates skill registration on `detectionEmulationRealExecution`. - Tool now returns `BuiltinSkillBoundedTool` (not `BuiltinToolDefinition`) so it satisfies the framework's `SkillBoundedTool` contract; basePath moved to the canonical `skills/security/endpoint`. UI - B4: removes the dead `RunEmulationModal` block from `rule_details/index.tsx`. - I9: `EmulationFilter` subscribes to `filterManager.getUpdates$()` so the toggle stays in sync when filters are mutated elsewhere. - I10/I11/I12: `RunEmulationModal` resets local state on `requestId`/`suggestion` change, disables Approve/Reject after click via `isSubmitting`, and parses modified args with a shell-style tokenizer instead of splitting on whitespace. - I13: replaces the `@elastic/eui/src/...` deep import with a type derived from EUI's published `onChange` prop signature. - I14: wraps `EmulationBadge` in `EuiToolTip` for keyboard / screen-reader users. Cleanup - I8/I17/N1/N2: deletes the unused `logInjection` flag, prunes dead `audit_logger` helpers, and slims the allowlist / rate-limiter APIs to the methods actually used. - I15: adds CODEOWNERS entries for `common/detection_emulation`, `public/detections/components/emulation`, and the `agent_builder/skills/detection_emulation` directory. Tests - N8: `EmulationBadge` test asserts via `data-test-subj`, not classNames. - Route, schema, skill, and component tests updated to cover the new gates (auth, idempotency, rate limit), the typed errors, the discriminated union, and the new tooltip / tokenizer behavior. - Full pre-commit pass: `type_check.js` clean, eslint clean on all changed files, 123/123 jest specs green across schema, server, agent-builder skill, and component suites.
Adds payloads/payloads.json with 12 entries covering ATT&CK techniques:
T1059.001, T1059.003, T1059.004, T1218.005, T1218.011, T1053.005,
T1547.001, T1057, T1003.001, T1070.004, T1071.001, T1112.
Each entry is typed { techniqueId, name, agentTypes[], command,
parameters, expectedSignals[] }. Payloads use self-cleaning shell
commands where possible to minimise post-emulation artifacts. T1057
(process discovery) uses `running-processes` and lists all 4 supported
agent types; all other entries use `execute` and are scoped to `endpoint`
which is the only agent type with execute support wired today.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds payloads/index.ts exporting: - EmulationPayload interface typed against ResponseActionAgentType and ResponseActionsApiCommandNames (import type — no runtime coupling to common constants). - payloadLibrary: readonly EmulationPayload[] loaded from payloads.json. - PAYLOAD_LIBRARY_MAX_ENTRIES = 15 governance constant. - findByTechniqueIds(ids): uses a Set for O(1) lookups; preserves library insertion order; deduplicates repeated IDs in the input. Adds payloads/index.test.ts with 26 jest assertions covering: - Hard-cap enforcement (toBeLessThanOrEqual PAYLOAD_LIBRARY_MAX_ENTRIES). - Shape validation: non-empty techniqueId/name, valid agentTypes, valid commands, at least one expectedSignal, unique techniqueIds. - Wave-1 technique coverage (it.each over all 12 required IDs). - findByTechniqueIds edge cases: empty input, no-match, single match, multi-match, unknown+known mix, order preservation, deduplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…type Adds emulation_report_type.ts with: - emulationReportType (SavedObjectsType<EmulationReportAttributes>): hidden=true, hiddenFromHttpApis=true, namespaceType=multiple-isolated, stored in SECURITY_SOLUTION_SAVED_OBJECT_INDEX. - EmulationReportAttributes interface covering all 14 fields from the spec: scenarioId, ruleId, scenarioFingerprint, mode, endpointIds, agentType, startedAt, completedAt, payloadIds, dispatchedActions[], score, perPhase[], operator, spaceId. - modelVersions baseline '1' with forwardCompatibility + create schemas (unknowns: 'ignore' on forward compat to allow future additive fields). - ES mappings: dynamic: false; score fields use float/integer; array fields (endpointIds, payloadIds, signals) mapped as keyword multi-value. Wires emulationReportType into saved_objects.ts types[] and exports it from lib/detection_emulation/index.ts alongside the existing emulationRuleBindingType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds detectionEmulationLogInjection: false alongside the existing detectionEmulationRealExecution flag. When true, the validateRule pipeline uses log injection (synthesised ECS documents) instead of dispatching real response actions to endpoints. Gating on a separate flag keeps the two dispatch modes independently toggleable and lets log injection ship before real execution is broadly available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n and validation keys
Adds two new optional sub-objects under xpack.securitySolution.detectionEmulation:
detectionEmulation.logInjection:
indexTemplateName (default: '.kibana-security-emulation-logs')
Base name for the ILM-managed index template; runtime appends
'<spaceId>-*' to form the full pattern.
retentionDays (default: 7, min: 1)
ILM delete phase for synthesised ECS documents.
detectionEmulation.validation:
wallBudgetMsDefault (default: 60 000 ms, min: 1 000)
Default telemetry-collector timeout per validateRule run.
wallBudgetMsMax (default: 300 000 ms, min: 1 000)
Hard ceiling for budget values accepted from API callers; requests
above this are clamped, preventing runaway long-poll connections.
Both sub-objects follow the existing schema.maybe(schema.object({...}))
pattern used by allowlist/rateLimiter/idempotencyCache — the whole group
is optional, code null-coalesces to baked-in defaults.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…: Update smoke spec findings shape to match spec: canRead + indexCount Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… notification path for risk #3 in README Adds a new section covering the audit SO fields (actor.kind, scenarioFingerprint SHA-256), Kibana security audit log integration, and three SOC tooling consumption patterns (Kibana rule, Watcher, Filebeat/Fleet pipeline). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…de role definition guidance for risk elastic#16 Documents how to run the discovery probe, the expected access surface for built-in ES roles (superuser + kibana_system as known residuals), and four least-privilege mitigations for operator-defined roles: no wildcard .kibana* grants, CCS pattern splitting, DLS match_none filter, and ES-layer audit logging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…date risk HTML: hero counts, rows #3-elastic#16, safeguards - Hero: Medium 2→1 (row #9 demoted to Low by event.dataset stamp); safeguards 21→23 - Row #3: mitigated note references OOB notification path documented in README - Row #4: permanent note updated — execute curatedOnly + allowedExecuteCommandPatterns - Row #9: Medium/Scheduled → Low/Mitigated (event.dataset + event.module stamp) - Row elastic#14: mitigated note updated — curatedOnly now covers upload (closed short-circuit) - Row elastic#16: Low/Scheduled → Low/Mitigated (discovery probe + README operator guidance) - Active safeguards: updated curatedOnly bullet (execute+upload), added execute-regex gate bullet, added event.dataset stamp bullet; intro updated to "Twenty-three controls" - Roadmap: #9 and elastic#16 rows updated to note shipped vs follow-up split Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ixes to changed detection_emulation files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nst undefined Partial configs (e.g. from older tests or forward-compat reads) omit the new field; use `?? []` so the length check never throws on undefined. The required interface still enforces the field at the TS layer for new call-sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e inspection in index_access smoke spec Rewrites the index_access smoke spec (Risk elastic#16) to use `_security/role` definition inspection instead of temporary user creation + privilege checks. The original run_as / per-role-client approach requires the cluster to have a master node for write quorum (putUser is a write operation). Role definition inspection is fully read-only and works on any cluster state. Results match the expected access surface documented in the README: superuser + kibana_system have read access; all other built-in roles do not. Other improvements: - SerializeError helper includes HTTP status code for non-200 ES responses - `create_index` field renamed to `createIndex` (camelCase, naming-convention) - `fleet_server` 404 surfaces as "404: {}" for clarity (role absent in ES 9.5) - Test runs in <200ms instead of timing out at 120s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Trim validateRule tool description from ~1800 to ~400 tokens for OSS
model compatibility (pipeline details already live in skill content)
- Shorten endpointIds .describe() to 2 concise sentences
- Align skill content tool references to use actual registered IDs
(security.detection-emulation.validate-rule, .get-history,
.run-process-command, etc.) instead of informal shorthand names
- Change agentType from z.literal('endpoint') to z.enum(['endpoint'])
across all 5 tool schemas for smoother future vendor extension
- Add 3 additional distractor examples to the eval dataset (ES|QL
question, threat hunting request, dashboard creation) bringing total
distractors to 5 per skill-dev-plugin guidance
Centralizes ~20 manually-constructed error responses into a single `emulation_tool_errors.ts` module with type-safe factory methods for each error class (featureDisabled, authorizationError, rateLimitExceeded, invalidParameters, userDeclined, validationGateBlocked, scenarioFailure, concurrencyExceeded, executionError, etc.). Updates withCommandGates, validateRule, and all 4 per-family run*Command tools to use the shared builder instead of inline ToolResultType.error constructions. Removes redundant ToolResultType imports from per-family tools.
The REST route's idempotency cache prevented double-dispatch from network retries, but the Agent Builder tool dispatch path (via withCommandGates) lacked this protection. LLM retries or framework-level transient-error retries could fire a second response action. Threads the idempotencyCache from DetectionEmulationGuardrails into all 4 per-family tools → withCommandGates context. Checks the cache after the allowlist gate (matching REST ordering) and writes back on both success and error paths so replays get the cached result.
Creates gate_checks.ts with protocol-agnostic gate functions: - checkRealExecutionFeatureFlags / checkModeFeatureFlags - checkValidation (curated-only + allowedScriptIds) - checkRbac (per-command RBAC via EndpointAuthz) - resolveEffectiveConfig (reads Advanced Settings per-space) - checkAllowlist (host allowlist) - acquireRateLimit (atomic per-space + per-host rate limit) - checkAuth (authenticated caller check) Each gate returns a typed GateResult<T> (ok/fail) with structured metadata. withCommandGates now composes from these primitives instead of inline logic — single source of truth for each gate check that both the tool dispatch and REST route can share.
Refactors the run_command REST route to import and call the shared gate check functions (checkRealExecutionFeatureFlags, checkAllowlistGate, acquireRateLimitGate) instead of duplicating the logic inline. The route still handles protocol translation (GateResult → HTTP response via siemResponse) and route-specific concerns (i18n messages, Kibana request context), but the gate logic itself is now single-sourced from gate_checks.ts.
Creates createRunFamilyCommandTool factory that builds the schema, confirmation, and handler from a FamilyToolConfig object. All four per-family tools (process, file, network, execution) are now config-only modules (~50 lines each) delegating to the factory. Eliminates ~400 lines of duplicated handler/schema/confirmation logic. Adding a new family (e.g. registry) is now a one-file, config-only addition.
Adds optional savedObjectsClient to CommandGatesContext and the factory handler destructure. When provided by the Agent Builder handler context, withCommandGates uses it directly for uiSettingsClient derivation rather than re-creating a scoped client via coreStart.savedObjects. This eliminates a redundant getScopedClient call (async hop) on every tool invocation while keeping backward compat (falls back to request-scoped creation when the field is absent).
Relocates resolve_current_user.ts from the skill-specific directory to server/lib/detection_emulation/ so it's importable from both the Agent Builder tool handlers and the REST routes without a cross-concern import path. Previously flagged in the code itself as "should be upstreamed" — this is the short-term path (shared within the plugin) while awaiting an export from @kbn/agent-builder-server.
- Remove unused DetectionEmulationFeatureFlags type import from gate_checks.ts - Migrate get_emulation_history_tool.ts to use the shared toolError builder instead of inline ToolResultType.error construction
… Workflows stack - Wire shared gate_checks into validate_rule_tool.ts (replaces ~80 lines of inline gates) - Delete run_command REST route + tests (-1,108 lines) — tool is the single implementation - Add traced logger (createTracedLogger) to createRunFamilyCommandTool factory - Wrap async operations in withCommandGates with runStep for timing/error attribution - Remove DETECTION_ENGINE_EMULATION_RUN_COMMAND_URL constant - Add execution modules: traced_logger, pipeline_step_error, tool_factory_deps, validate_pre_execution Patterns adopted from PRs elastic#260739, elastic#260744, elastic#260793, elastic#260811.
2044677 to
f8dd99c
Compare
- Remove DEMO_GUIDE.md, openspec/, .playwright-mcp/ (not part of this PR) - Remove dead functions: buildEmulationModeQuery, extractEmulationMetadata, isEmulationAlert - Clean up corresponding test cases and unused imports
- Fix runtime crash: featureFlags undefined on log_injection path - Move gate_checks.ts from agent_builder/skills/ to lib/execution/ (fixes circular dependency: lib/ was importing from agent_builder/) - Remove `as any` casts for coreStart.security (unnecessary) - Remove empty ValidateRuleToolDeps interface (use ToolFactoryDeps) - Remove DEMO.md files and production-risk-analysis.html
…on Agent Builder skill Three evaluators per example: - toolSelection (createSkillInvocationEvaluator): APM trace check for SKILL.md filestore.read span — verifies skill activation - schemaCompliance (createTraceBasedEvaluator): ES|QL query over traces asserting every validate-rule call includes ruleId + endpointIds - criteria (DefaultEvaluators.criteria): per-example LLM judge 9 examples: 2 success paths (T1059.001, T1218.005), default mode, history-first flow, 3 failure modes, 2 distractors. Includes .eslintrc.js boundary-crossing exemption so *.spec.ts can import the devOnly @kbn/evals package without affecting the plugin build.
…rver/agent_builder/skills/detection_emulation/evals/validate_r Auto-committed by patryks-treadmill orchestrator. plan=detection-emulation-skill-epic-15974-orchestration-layer job=f23206c3-0fe4-461c-b545-e2f737b7f735 attempt=1
… orchestrator - Extract runRuleExecutors from route.ts (already committed) - Clean up unused imports in route.ts (SERVER_APP_ID, RuleExecutionStatusEnum, alertInstanceFactoryStub) - Add optional rulePreviewDeps to OrchestratorOptions for rule preview validation - Add Step 8 (Rule Preview Validation) to scenario orchestrator pipeline - Add rulePreviewValidation to OrchestratorResult - Update DESIGN.md with Section 9: Implementation Status & Gap Analysis
…e demo screenshot showing Agent Builder transcript with succes Auto-committed by patryks-treadmill orchestrator. plan=detection-emulation-skill-epic-15974-orchestration-layer job=49341029-00d6-4a17-af8c-300aa7d3f419 attempt=2
Brings the plugin-side eval files into parity with the canonical kbn-evals-suite-detection-emulation suite: Dataset (validate_rule_dataset.ts): - Add `tool_sequence` field to all examples (consumed by trajectory evaluator for LCS-based order scoring; `[]` for distractor examples so the evaluator returns 1.0 when no tools fired) - Add `autoConfirm` field to the HITL `userDeclines` example - Add HITL example: user declines real_execution prompt → `user_declined` - Add 3 extra distractor examples: ES|QL question, threat hunting, dashboard creation (total 13 examples, up from 9) Spec (validate_rule.spec.ts): - Replace `p-retry` with `withRetry` from @kbn/evals (consistent with canonical suite; N5 tracking) - Add HITL auto-resume loop in DetectionEmulationChatClient.converse: polls `response.prompts`, responds with the per-example `autoConfirm` policy, bounded by MAX_PROMPT_ROUNDS=5 - Add `createValidateRuleTrajectoryEvaluator` (createTrajectoryEvaluator with orderWeight=0.7 / coverageWeight=0.3) applied to every example - Wire `autoConfirm` policy from `example.input.autoConfirm` into each `runScenario` call - Register the 4 new evaluate() blocks matching the 4 new examples Three required evaluators per spec §8 remain: toolSelection (renamed from createToolSelectionEvaluator → same createSkillInvocationEvaluator underneath), schemaCompliance, criteria. Trajectory is additive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…skill Adds DEMO.md alongside the existing README.md covering: - Feature flag setup in kibana.dev.yml (detectionEmulationLogInjection, detectionEmulationRealExecution) and full optional runtime config keys - Step-by-step Agent Builder UI walkthrough (happy path, history-first, real execution, and failure cases) - Full ValidationReport response field reference with inline comments - Typed error response table (error_type, HTTP equivalent, trigger) - Dev Tools queries for inspecting injected log-injection documents - Saved Objects Find API snippet for browsing emulation history - Troubleshooting table for common failure modes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements full ES|QL query inversion using @elastic/esql AST parser. Extracts field constraints from WHERE clauses to generate matching synthetic log documents. Supported operators: ==, !=, >, >=, <, <=, LIKE, RLIKE, IN, IS NULL, IS NOT NULL, AND, OR, NOT. For aggregating queries (STATS ... | WHERE threshold), only the first WHERE clause is inverted — threshold filters operate on aggregation results, not document fields. - New esql_inverter.ts module with extractEsqlConstraints() - Wired into query_inverter.ts dispatcher (language === 'esql') - 26 unit tests covering all operators, edge cases, and real-world patterns - Lucene remains as the only graceful-degradation language
Observation:
|
| Capability | Location | What it does |
|---|---|---|
| RBAC (per-command) | validateRequest() → isActionSupportedByAgentType() |
Checks command is supported for agent type + action mode (manual/automated) |
| Enterprise license gate | validateRequest() → getLicenseService().isEnterprise() |
Blocks automated actions without Enterprise license |
| Space-scoped agent validation | validateRequest() → fetchAgentPolicyInfo() |
Validates agents are in the active space |
| Audit trail | writeActionRequestToEndpointIndex() |
Writes to .logs-endpoint.actions-* with full attribution |
| Cases attachment | updateCases() |
Attaches action to Security cases |
| Telemetry | sendActionSentTelemetry() / sendActionResponseTelemetry() |
Reports ENDPOINT_RESPONSE_ACTION_SENT_EVENT |
| Action expiration | getActionRequestExpiration() |
Sets TTL on action requests |
| Dispatch (all 12 commands) | EndpointActionsClient.isolate(), .execute(), etc. |
Typed dispatch via Fleet actions API |
What the emulation skill re-implements
Gate in withCommandGates / gate_checks.ts |
Overlap with client |
|---|---|
checkRbac() — maps command → RESPONSE_CONSOLE_ACTION_COMMANDS_TO_REQUIRED_AUTHZ → endpointAuthz[key] |
validateRequest() already checks isActionSupportedByAgentType(agentType, command, actionType). The emulation RBAC gate adds a finer-grained check (per-console-command privilege), which the base client doesn't do — this is a genuine addition |
checkAuth() — resolves username via _security/_authenticate |
The client constructor takes username as a required param — the caller already resolved it. The emulation skill re-resolves because of the Task Manager fakeRequest issue, but for interactive Agent Builder calls the request already carries the user |
EmulationRunner.dispatch() — exhaustive switch over all 12 command types |
EndpointActionsClient has the identical switch — .isolate(), .execute(), .killProcess(), etc. The runner is a pass-through wrapper |
EmulationRunner.createResponseActionsClient() — getResponseActionsClient(agentType, opts) |
One-liner factory call, same as what any consumer does |
What's genuinely emulation-specific (should stay)
These are NOT in the base client and rightfully belong in the emulation layer:
EmulationAllowlist— operator-controlled host allowlist (Advanced Settings)EmulationRateLimiter— per-space (100/h) + per-host (3/h) sliding windowsEmulationIdempotencyCache— dedup retried LLM tool callscheckRealExecutionFeatureFlags()— emulation-specific feature flag + runtime kill switchcheckValidation()— curated-only mode,allowedExecuteCommandPatternsregex allowlist,allowedScriptIdsbuildEmulationComment()— audit attribution withconversationId/runId/toolCallId/SHA-256 prompt hashEmulationRunner.resolveRuleBinding()— rule context lookup for emulation actions
Suggested simplification
The EmulationRunner class could be reduced to a thin wrapper that:
- Resolves the rule binding (emulation-specific)
- Builds the emulation comment with actor attribution (emulation-specific)
- Calls
client.isolate()/client.execute()/ etc. directly — no dispatch switch needed
The exhaustive dispatch() switch duplicates the typed interface that ResponseActionsClient already enforces. If a new command is added, it needs to be added in both places today — single point of truth would be better.
// Before: EmulationRunner.dispatch() has 12-case switch
// After: direct client call
const client = getResponseActionsClient('endpoint', constructorOptions);
const actionDetails = await client[commandMethodMap[input.command]](request, options);The withCommandGates pipeline is valuable — the emulation-specific gates (allowlist, rate limiter, idempotency, feature flags, validation) are genuine additions. But the RBAC check + auth check + dispatch could lean on the client directly rather than re-implementing them.
This isn't blocking — the current implementation works and is well-tested. But when the Response Actions Skill (#17508) ships, it will use ResponseActionsClient directly (no runner, no custom dispatch switch). Having two patterns for the same underlying dispatch in the same codebase will cause confusion about which one to use for future response-action surfaces.
Context: this came up while planning the shared infrastructure between the detection emulation skill and the upcoming endpoint response actions skill. The response actions skill will be ~50 lines of tool handler code because it delegates everything to ResponseActionsClient + getActionDetailsById() directly.
Summary
Closes security-team#15974.
Lands the Detection Emulation Skill end-to-end on a single branch. An Agent Builder skill takes a candidate detection rule + target host(s), runs an emulated attack (real EDR action OR ECS log injection), polls the Detection Engine for the alerts the rule produces, and returns a
ValidationReportwith a confidence score.Every action is gated by:
detectionEmulationRealExecution,detectionEmulationLogInjection)xpack.securitySolution.detectionEmulation.realExecution.enabled) for fast disable without redeployreal_executionscenario per Kibana spaceexecutepayloads (allowedExecuteCommandPatterns)conversationId/runId/toolCallId/ SHA-256 prompt hash) in the audit trail93 files / +17,020againstupstream/main. Server-side orchestration only; existing UI surfaces unchanged. Allowlist + rate-limiter knobs are exposed as Stack Advanced Settings.Architecture
What's new
lib/detection_emulation/execution/runner.ts,api/dispatch/route.tsexecute,runscript,kill-process,isolate,release).lib/detection_emulation/payloads/{payloads.json,index.ts}lib/detection_emulation/scenario_generator.tsscenarioId = sha256(...). Typed errorsno_mitre_tags,no_supported_techniques.lib/detection_emulation/log_injection/{generator,index_template,executor}.ts.kibana-security-emulation-logs-<spaceId>-*index template w/ 7-day ILM. Synthetic ECS docs stamped withevent.dataset,event.module: 'emulation', and tagged withemulation: { mode, emulationId, scenarioId }.lib/detection_emulation/telemetry_collector.tspoll+one_shotmodes. AbortController. Wall-budget enforcement. Queries alerts bykibana.alert.original_event.module: emulationOR emulation tag viabool.should.lib/detection_emulation/confidence_scorer.tsconfidence = round(coverage * 0.6 + precision * 0.4, 2)clamped [0, 1]. Pure.lib/detection_emulation/execution/validation_gate.tsexecute, and rate-limit acquire.agent_builder/skills/detection_emulation/gate_checks.tswithCommandGatesdecorator extracted from per-family tool boilerplate. Composes feature-flag → auth → RBAC → allowlist → per-host rate → per-space rate → fanout cap → validation gate → audit in a single pipeline.lib/detection_emulation/emulation_history/{create,get,find,index}.ts+emulation_report_type.tsscenarioFingerprintdedup, namespace-scoped. Model version2addsactorfield withdata_backfill: { kind: 'user' }for old rows.validateRuleroutelib/detection_emulation/api/validate_rule/route.tsPOST /internal/detection_engine/emulation/validate_rule. Eight-step pipeline. Typed 4xx errors. All strings i18n-translated.agent_builder/skills/detection_emulation/{validate_rule_tool,get_emulation_history_tool,create_run_family_command_tool}.tsprocess/file/network/execution) built viacreateRunFamilyCommandTool. Each runs throughwithCommandGates.validateRuleandgetEmulationHistoryare standalone tools.agent_builder/skills/detection_emulation/build_emulation_confirmation.ts+ per-family toolsconfirmation: { askUser: 'once', getConfirmation };validateRuledoes an on-demand HITL prompt whenmode === 'real_execution'. Skipped only inexecutionMode === 'standalone'(eval / A2A).agent_builder/skills/detection_emulation/emulation_tool_errors.tsfeature_flag_disabled,endpoint_not_allowed, etc.) with consistent shape across all tools.agent_builder/skills/detection_emulation/detection_emulation_skill.tsreferencedContentarray with MITRE ATT&CK overview. Section order: When-to-Use → Process → Examples → Guardrails → Response Format.common/experimental_features.ts,server/config.ts, runtime config resolverpackages/kbn-evals-suite-detection-emulation/evals/{validate_rule_dataset,validate_rule.spec}.tssecurity: detection-emulation-validate-rule. Eight examples (success / failure / log-injection / distractor /userDeclinesHITL rejection /realExecBlockedallowlist 403). Evaluators:toolSelection,schemaCompliance,criteria, plus a trajectory evaluator that scorestool_sequencefrom execution traces.resolveCurrentUsernamemoved to shared lib,savedObjectsClientthreaded through gatesPipeline (route execution order)
Failure-mode coverage
errorCodefeature_flag_disabledendpoint_not_allowedcommand_not_allowedexecutepayload didn't match anyallowedExecuteCommandPatterns.user_declinedendpoint_fanout_exceededMAX_ENDPOINT_FANOUT(5) endpoints in one call.no_mitre_tagsno_supported_techniquesrule_not_foundrate_limit_exceededblockedEndpoints+Retry-After.concurrency_exceededreal_executionscenario is in flight in this space. Response includesinflight_scenario_fingerprint+Retry-After.caveatswall_budget_exceededes_bulk_errorKey refactors (latest iteration)
createRunFamilyCommandToolreplaces copy-pasted per-family tool files. One factory, parameterized by family name and command schema.withCommandGatesextracted from per-family tools — the REST route now reuses the same gate pipeline.EmulationToolError.from(code, message)replaces ad-hoc error construction.resolveCurrentUsernameshared: moved to shared lib for reuse across tools and routes.savedObjectsClientthreading: threaded through gate checks instead of resolving ad-hoc per tool.bool.shouldwithminimum_should_match: 1to match alerts bykibana.alert.original_event.module: emulationOR emulation tag — ensures confidence > 0 for log_injection mode.referencedContentschema compliance: corrected from object to array format perreferencedContentSchemaintype_definition.ts.Demo plan
config/kibana.dev.yml(full snippet inDEMO.md).xpack.securitySolution.detectionEmulation.allowlist.endpointIds: [<your-pilot-host>].T1059.001 Windows PowerShell).<ruleId>against endpoint<endpointId>using log injection."ValidationReport:confidence,coverage,precision, per-phase breakdown, history SO id.via=agent-builder/conv:<id>/run:<id>actor attribution.endpoint_fanout_exceeded.real_executionwhile the first is in flight — second is rejected withconcurrency_exceeded+Retry-After.<ruleId>." →getEmulationHistoryreturns paginated history with theactorfield populated.Full walkthrough:
x-pack/solutions/security/plugins/security_solution/server/lib/detection_emulation/DEMO.md.Test plan
security_solutionnode scripts/jest --testPathPattern='detection_emulation|emulation_report_type|validation_gate'detectionEmulationLogInjectiondefaults tofalseactor.kind(user vs agent-builder) with conversation/run/tool ids and prompt hashMAX_ENDPOINT_FANOUT = 5at the Zod boundaryreal_executionper space (1)executepayloads gated byallowedExecuteCommandPatternsregex allowlistdetectionEmulation.realExecution.enabled) returns 403 when disabledstandaloneexecutionModeoriginal_event.module: emulation→ telemetry collector finds alert → confidence > 0detection_emulation.integration.test.ts) withEMULATION_SMOKE_ES_URLagainst a live ES (manual; documented in README)Out of scope
getEmulationHistorytool exposes the SO via the Agent Builder; a dedicated stack-management UI is intentionally out of scope.kbn-evalssuite ships with deterministic mocks; a live-cluster baseline run + result publication is a separate follow-up because it requires connector + cluster setup outside of CI.