Skip to content

[Nightshift] Add panels, flyouts, and detail components#271281

Draft
smith wants to merge 116 commits into
elastic:mainfrom
smith:nightshift-panels-flyouts
Draft

[Nightshift] Add panels, flyouts, and detail components#271281
smith wants to merge 116 commits into
elastic:mainfrom
smith:nightshift-panels-flyouts

Conversation

@smith
Copy link
Copy Markdown
Contributor

@smith smith commented May 26, 2026

Summary

Ports the presentation layer from the kbn-sigevents package (PR #264555 branch) into kbn-nightshift on main:

  • NightshiftOverview — main panel with critical/warning/healthy states, impacted cards, and metric widgets
  • Event detail flyoutsSignificantEventDetailBody, SignificantEventDetailHeader, LowerPriorityEvents, OtherPromotedEvents with expandable flyout views
  • Supporting UICriticalityDonut, DependencyChainMap (ReactFlow), InfoPanel, RootCausePanel, RecommendationsPlanPanel, StatusHeader/Banner, ImpactedCard, MetadataIconCard
  • Data hooksuseFetchLatestSignificantEvent, useFetchSystemOverview, useFlyoutFocusManagement
  • 149 tests across 18 test suites, plus Storybook stories for all components

Agent builder chat features are excluded. The AiButton "Remediate" buttons remain as callback props (onRemediate) that consumers can wire to whatever backend they choose.

The SignificantEventDocument type (previously imported from @kbn/observability-agent-builder-plugin) is now defined locally in types/significant_event_document.ts.

Test plan

  • tsc --noEmit passes with no nightshift-specific errors
  • All 149 jest tests pass (jest --config kbn-nightshift/jest.config.js)
  • Pre-commit lint checks pass
  • Verify Storybook renders (yarn storybook kbn-nightshift)
  • Verify the nightshift page loads in the browser at /app/observability/nightshift

…hift

Port the presentation layer from the kbn-sigevents package (PR elastic#264555)
into kbn-nightshift on main. This includes the overview panel, event
detail flyouts, all supporting UI components, data-fetching hooks,
tests, and Storybook stories. Agent builder chat features are excluded —
the AiButton remediate callbacks remain as wirable props without any
agent builder dependency.
@smith smith requested review from a team as code owners May 26, 2026 14:13
@smith smith marked this pull request as draft May 26, 2026 14:13
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

@kibanamachine
Copy link
Copy Markdown
Contributor

kibanamachine commented May 26, 2026

💔 Build Failed

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observability 1817 2051 +234

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observability 2.1MB 2.3MB ⚠️ +279.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
observability 102.1KB 102.1KB +18.0B

History

smith and others added 22 commits May 27, 2026 11:54
The View Details button on the main significant event card was not
opening a flyout. Add internal flyout state management to
NightshiftOverview (matching the pattern used by OtherPromotedEvents)
and wire up the NightshiftPage to pass real data from
useFetchLatestSignificantEvent.
Closes elastic#259251

## Summary

- Pins Console HTTP method completion ordering with explicit `sortText`
values so Monaco does not fall back to alphabetical label sorting.
- Keeps the existing method set unchanged while ordering `GET` first and
`DELETE` last.

## Root Cause

- Monaco uses `sortText` for completion ordering and falls back to the
item label when `sortText` is missing, which can put `DELETE` before
safer/default verbs.

## Fix

- Assign stable `sortText` values to method completion items based on
the intended canonical order.
- Add a focused unit test that sorts method suggestions the same way and
verifies `GET` is first and `DELETE` is last.

## Before
<img width="723" height="466" alt="image"
src="https://github.com/user-attachments/assets/faf244b2-4207-483b-acbc-32b148441b18"
/>

## After
<img width="725" height="437" alt="image"
src="https://github.com/user-attachments/assets/782c0c60-6052-4c28-80bc-f45403fa1383"
/>

## Test Plan

- `node scripts/jest
--config=src/platform/plugins/shared/console/jest.config.js
src/platform/plugins/shared/console/public/application/containers/editor/monaco_editor_actions_provider.test.ts`
— passed.
- `node scripts/check_changes.ts` — passed.

## Release Note

- Fixes Console autocomplete so `GET` is shown before `DELETE` when
suggesting HTTP methods on an empty request line.

Assisted with Cursor using GPT-5.5

Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary

Limiting flaky test runner to 50x runs per config. We should be mindful
with CI costs related to flaky tests investigation and 50 runs is more
than enough to confirm the fix.
…tic#270775)

## Summary

Fixes the flaky `fullscreen.spec.ts` Scout test ("should interact with
metrics in fullscreen mode").

**Two failure modes were identified:**

1. **Chrome header interception** (~65 failures, Apr 3–18): Already
fixed by PR elastic#264932 which added `chromeHeader.waitFor({ state: 'hidden'
})` to `toggleFullscreen()`. No recurrence since.

2. **`viewDetails` action not found** (2 failures, May 20): After
`clearSearch()`, the grid re-renders with the full unfiltered metric
set. The test immediately opened the context menu without waiting for
the grid to settle — the Lens embeddable hadn't finished re-mounting and
registering its actions.

**Fix:** Add `await
expect(metricsExperience.pagination.container).toBeVisible()` after
`clearSearch()` to wait for the grid to finish re-rendering before
opening the context menu. This matches the pattern used in
`grid.navigation.spec.ts`.

Closes elastic#261199

## Test plan

- [ ] Run `node scripts/scout run-tests --arch stateful --domain classic
--testFiles
src/platform/plugins/shared/discover/test/scout/ui/parallel_tests/metrics_experience/fullscreen.spec.ts`
locally
- [ ] Verify CI passes on stateful and serverless targets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* Introduces a new CodeQL rule to alert on API routes that use either
`@kbn/config-schema` or `zod` in a way that allows for strings of
unbounded length. This is a companion to our existing CodeQL rule that
alerts on unbounded arrays.
* Introduces a [file exclusion
pattern](https://github.com/legrego/kibana/blob/87f43b9bc3bc1a78b718cbb845861de32e53c3e7/.github/codeql/custom-queries/dos/KibanaDoSExclusions.qll)
for both of our DoS CodeQL rules to allow us to more systematically omit
files from alerting on these findings, if it is clear that their usage
of these validation schemas is not for the purpose of API route
validation.

---------

Co-authored-by: Elena Shostak <165678770+elena-shostak@users.noreply.github.com>
…r named OAS components (elastic#270983)

## Summary

Closes elastic#263711

Adds `meta: { id: '...' }` to every `schema.object()` call used as a
request body or response body in the maintenance window plugin. This
causes the OAS generator to emit named `$ref` components (e.g.
`Kibana_HTTP_APIs_maintenance_window_response`) instead of inlining the
full schema at every endpoint that uses it.

## What changed and why

### Schema files (10 files)

`meta: { id }` was added **only to body/response schemas** — never to
path params or query params, which would throw a runtime error.

All IDs use a `maintenance_window_` prefix to avoid collisions with
existing named components from the alerting plugin (`r_rule_response`,
`alerts_filter_query`, `schedule_request` already exist there).

> **Note on internal schemas:** `meta: { id }` was intentionally not
added to internal route schemas (`/internal/maintenance_window/*`). The
OAS capture script filters exclusively for `access: 'public'` routes, so
internal schemas never appear in the generated `kibana.yaml` /
`kibana.serverless.yaml` — adding IDs there would have no effect.

> **Note on diff size:** The diff looks larger than the actual change.
To add a second argument `{ meta: { id } }` to `schema.object()`, the
single-argument form `schema.object({...})` must be restructured into
two arguments, which re-indents everything inside. The only real
additions are the 13 `meta: { id }` lines — one per schema object.


**External API schemas** (`/api/maintenance_window`):
| File | ID added |
|------|----------|
| `external/request/create/schemas/v1.ts` | `new_maintenance_window` |
| `external/request/update/schemas/v1.ts` | `update_maintenance_window`
|
| `external/request/find/schemas/v1.ts` |
`find_maintenance_windows_response` |
| `external/response/schemas/v1.ts` | `maintenance_window_response` |
| Nested scope object (create/update/response) |
`maintenance_window_scope` |
| `schedule/schema/v1.ts` (request) |
`maintenance_window_schedule_request`,
`maintenance_window_schedule_recurring_request` |
| `schedule/schema/v1.ts` (response) |
`maintenance_window_schedule_response`,
`maintenance_window_schedule_recurring_response` |


**Shared schemas** (used by both external and internal routes):
| File | ID added |
|------|----------|
| `r_rule/request/schemas/v1.ts` | `maintenance_window_r_rule_request` |
| `r_rule/response/schemas/v1.ts` | `maintenance_window_r_rule_response`
|
| `alerts_filter_query/schemas/v1.ts` |
`maintenance_window_alerts_filter_query` |

### OAS output files (2 files)

`oas_docs/output/kibana.yaml` and
`oas_docs/output/kibana.serverless.yaml` were regenerated using the same
capture command CI uses:

```sh
node scripts/capture_oas_snapshot.js \
  --include-path /api/status \
  --include-path /api/alerting/rule/ \
  --include-path /api/alerting/rules \
  --include-path /api/actions \
  --include-path /api/security/role \
  --include-path /api/spaces \
  --include-path /api/streams \
  --include-path /api/fleet \
  --include-path /api/saved_objects \
  --include-path /api/maintenance_window \
  --include-path /api/agent_builder \
  --include-path /api/workflows \
  --include-path /api/dashboards \
  --include-path /api/visualizations \
  --include-path /api/security/entity_store

cd oas_docs && make api-docs
```

The output now references named components like `$ref:
'#/components/schemas/Kibana_HTTP_APIs_maintenance_window_response'`
instead of inlining the full schema object at every endpoint.

## Checklist

- [x] `meta: { id }` added only to `schema.object()` body/response
schemas (not path/query params)
- [x] All IDs prefixed with `maintenance_window_` to avoid collisions
with alerting plugin components
- [x] OAS output files regenerated with `make api-docs`
- [x] New named components verified in `kibana.yaml` and
`kibana.serverless.yaml`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…tic#270108)

## Summary

closes elastic/streams-program#1339

This PR converts the remaining DSL-based read paths in the streams
plugin (`FeatureClient`, `QueryClient`, `InsightClient`, and the
significant-events alerts reader) over to ES|QL via
`storageClient.esql`. It is the last step of a broader effort to
standardise local-index reads on ES|QL inside streams; earlier PRs
migrated the other read paths in the same area.

### What changes

- **New `IStorageClient.esql` method** on `kbn-storage-adapter`. Calling
it goes through the same read pipeline as `search` / `get`: mapping
bootstrap (`ensureMappingsBeforeReading`), graceful empty results when
the index doesn't exist yet, and optional `maybeMigrateSource` on the
`_source` column. All ES|QL reads in this PR go through it.
- **`FeatureClient` and `QueryClient`** — every DSL `search` / `get`
call used for listing, fetching, filtering, and keyword-search of
knowledge indicators (features + queries) is replaced with an equivalent
ES|QL query. Behaviour is preserved; the only externally observable
change is that the queries are now expressible as ES|QL.
- **Significant-events sparkline reader
(`readSignificantEventsFromAlertsIndices`)** — previously a
`date_histogram` aggregation against `.alerts-streams.alerts-default`,
now a single ES|QL `STATS COUNT(*) BY rule_uuid, BUCKET(@timestamp, ?)`
with client-side gap-filling so empty buckets still appear as zeros in
the sparkline. The legacy `change_points` field on the response is kept
as an empty stub for backwards compat with the existing consumer schema;
its removal is tracked separately.
- **Insight generation (`collectQueryData`)** — now issues two parallel
ES|QL queries per rule (one for the total count, one for sample
`_source` rows) instead of a DSL `search` with aggregations. Time bounds
are passed as ISO-timestamp named params rather than relative `now-15m`
expressions, which is more predictable when the request and execution
clocks differ.
- **`InsightClient` reads (`get`, `list`, `bulk` validation)** —
migrated to `storageClient.esql`. One small behaviour fix:
`list().total` now reflects the actual number of returned insights.
Previously it was always `0` in production because the DSL search used
`track_total_hits: false`.
- **Shared helpers extracted** — `fillBucketGaps`, `parseBucketSize`,
and `ESQL_UNITS` were duplicated between the alerts reader and
`preview_significant_events.ts`. They now live in a new
`sig_events/helpers/` module with unit tests.

### What is intentionally not changing

Four hybrid/semantic search methods (`findFeaturesBySemantic`,
`findFeaturesByHybrid`, `findQueriesBySemantic`, `findQueriesByHybrid`)
remain on DSL. They rely on vector-search clauses that don't have a
clean ES|QL equivalent today, and are deferred to follow-up issues:
[streams-program
elastic#1338](elastic/streams-program#1338),
[streams-program
elastic#1340](elastic/streams-program#1340).

### Storage Adapter
I have added the `storageClient.esql` for ES|QL-based reads while
keeping the storage adapter’s existing boundary: callers can define read
semantics, but the adapter owns the backing storage index.

To avoid introducing a raw cross-index escape hatch, the new method
validates the ES|QL query before execution. It parses the query,
requires a `FROM` command, and ensures every `FROM` index source targets
the adapter’s own storage index. Invalid or out-of-scope queries fail
before mapping bootstrap or Elasticsearch execution.

---

*The PR was developed with Claude Code*

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
…er in attacks discovery page (elastic#269352)

## Summary

[See screenshot
here](https://elastic.slack.com/archives/C08U04SUN49/p1778687779903229)

- move the default popover position to be `upCenter` so taller action
items can be seen

<img width="1617" height="713" alt="image"
src="https://github.com/user-attachments/assets/edd4a77e-412b-43a0-9dd9-b93e4bdc2435"
/>


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
## Summary

Adds a two-step agent skill workflow for end-to-end Security Solution
bug fixing at
`x-pack/solutions/security/plugins/security_solution/.agents/skills/`.

This PR delivers the skill logic only. The skills are designed for
interactive CLI sessions (Claude Code or Cursor) where a human drives
each step. A follow-up to automate the workflow via GitHub label
triggers is described at the bottom.

---

## Current limitations

### Serverless not supported (follow-up PR)

This skill supports **stateful (ECH) environments only**. When the agent
encounters a ticket for a serverless deployment it stops immediately and
tells the user it cannot proceed — serverless reproduction requires a
different server setup (different runner, different auth provider) that
is not covered here. Serverless support is planned as a separate
follow-up PR.

---

## Writing tickets the skill can act on

The skill extracts reproduction steps, affected paths, feature flags,
and deployment type directly from the GitHub issue. A ticket missing or
vague on any of these will cause the agent to stop and ask for
clarification — or worse, silently reproduce the wrong state. The
checklist below is what the agent validates before moving past Phase 0.

**Reproduction steps** — a specific navigation path with exact user
actions, not a summary:

| ✅ Good | ❌ Not enough |
|---|---|
| "Go to **Rules → Create rule → Select Threshold → scroll to Suppress
alerts**" | "Go to the threshold rule form" |
| "Click the rule row, switch to the **Exceptions** tab, add an
exception with no conditions" | "Open a rule and add an exception" |

**Current behavior** — a concrete observable symptom (error text,
missing element, wrong value, network failure):

| ✅ Good | ❌ Not enough |
|---|---|
| "The 'Optional' badge is absent next to 'Suppress alerts'" |
"Something looks wrong with the rule form" |
| "`POST /api/detection_engine/rules` returns 400 with
`[value.index_pattern]: expected value of type [string]`" | "Rule
creation fails" |

**Expected behavior** — what the correct state looks like:

| ✅ Good | ❌ Not enough |
|---|---|
| "'Suppress alerts' should show 'Optional' — as it does on Custom Query
and EQL rule types" | "It should work correctly" |

**Feature flags** — if the bug only appears behind an experimental flag,
list the flag name explicitly. Omitting this causes the agent to fail to
reproduce or reproduce a different code path entirely:

```
xpack.securitySolution.enableExperimental: [assistantModelEvaluation]
feature_flags.overrides.some.flag: true
```

**Deployment type** — state explicitly whether the bug is on a stateful
or serverless deployment. If omitted and the bug turns out to be
serverless-only, the skill will discover this in Phase 0 and stop — but
only after the Scout server has already started booting.

Tickets that include all five items above get a `high`-confidence
analysis and proceed to browser reproduction without interruption.
Tickets missing reproduction steps or current behavior get `low`
confidence and the agent asks for clarification before booting the
server.

---

## How to use

These skills live inside the Security Solution plugin directory, not at
the repo root, so Claude Code does not auto-discover them. There are two
ways to use them:

### Option A — Explicit invocation (no setup needed)

Ask your agent (Claude Code or Cursor) to read the skill file directly:

**Step 1 — reproduce:**
> "Read and follow
`x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-reproduce/SKILL.md`
for issue #NUMBER"

**Step 2 — fix** (after reviewing the reproduction report):
> "Read and follow
`x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-fix/SKILL.md`"

### Option B — Symlinks for auto-discovery

**Claude Code** — symlink into `~/.claude/skills/` (personal skills
directory). Run once:

```bash
mkdir -p ~/.claude/skills
ln -s /path/to/kibana/x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-fixer ~/.claude/skills/bug-fixer
ln -s /path/to/kibana/x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-reproduce ~/.claude/skills/bug-reproduce
ln -s /path/to/kibana/x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-fix ~/.claude/skills/bug-fix
```

Replace `/path/to/kibana` with the absolute path to your local Kibana
clone. Absolute paths are required here since the symlinks live outside
the repo tree.

**Cursor** — symlink into `.agents/skills/` at the repo root. Run once
from the Kibana repo root:

```bash
ln -s ../../x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-fixer .agents/skills/bug-fixer
ln -s ../../x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-reproduce .agents/skills/bug-reproduce
ln -s ../../x-pack/solutions/security/plugins/security_solution/.agents/skills/bug-fix .agents/skills/bug-fix
```

After adding symlinks, start a new session — skills are loaded at
session start and won't appear in an already-running session.

Once set up, you can invoke them with:
> `/bug-reproduce #NUMBER`
> `/bug-fix`

> These symlinks are a local developer setup step — do not commit them.
The skills themselves stay in the Security Solution plugin directory so
they remain co-located with the code they operate on.

Full workflow overview and prerequisites: `bug-fixer/SKILL.md` at the
same path.

---

## Human interaction points

The workflow has two mandatory human checkpoints. Everything else is
fully automated.

| Checkpoint | When | What the human does |
|---|---|---|
| **1 — Reproduction report** | End of Phase 3 | Read the report; reply
to confirm the bug was reproduced correctly. The agent writes
`user_acknowledged: yes` only after this reply. |
| **2 — Fix plan approval** | Phase 4 Step 1 | Read the Root Cause
Analysis and Fix Plan; explicitly approve before any code is written. |

Two further interactions are optional:
- **PR creation** (Phase 6) — agent asks whether to open a draft PR; you
say yes or skip
- **Review comments** (Phase 7) — triggered by reviewer activity, not
the agent

---

## Skills

| File | Purpose |
|---|---|
| `bug-fixer/SKILL.md` | Entry point: explains the two-step workflow and
exact invocation commands |
| `bug-reproduce/SKILL.md` | Step 1: ticket analysis, Scout server,
browser reproduction, diagnostics |
| `bug-fix/SKILL.md` | Step 2: fix plan approval, TDD fix, verification,
optional draft PR |
| `bug-fixer/KNOWLEDGE.md` | Cross-session knowledge base, updated after
each fixing session |
| `bug-fixer/references/classification-guide.md` | Bug patterns, test
layer decision rules, fix strategies |
| `bug-fixer/references/fix-workflow.md` | Root cause analysis template,
SELF-CHECK questions |
| `bug-fixer/references/baseline-failures.md` | 10 documented agent
failures used to harden the skill rules |
| `bug-fixer/references/knowledge-update.md` | Protocol for adding
entries to KNOWLEDGE.md |
| `bug-fixer/references/troubleshooting.md` | Known environment
conditions (SAML redirect, AI Agent modal) |

---

## Design decisions and the challenges they solve

### Split into two separate skill invocations

The most important architectural decision in this PR.

**The problem:** In testing, the agent repeatedly skipped mandatory
phases (browser reproduction, fix plan approval) when it identified what
looked like an obvious fix from code analysis alone. We went through
several rounds of increasingly strong language — "protocol violation",
"certainty before reproduction is a red flag", explicit self-checks at
every phase boundary — and the agent bypassed them every time. Textual
instructions cannot reliably override an LLM's drive to reach the answer
efficiently.

**The solution:** Reproduction (`bug-reproduce`) and fix (`bug-fix`) are
now separate skill invocations. The fix agent starts cold — it only sees
`analysis.json` and `reproduction-report.md` on disk. It has no memory
of the analysis phase and cannot reason "I already know the fix from
code reading." The mandatory stops are enforced by the conversation
boundary, not by agent self-discipline.

We first tried a "two-turn" restructure of a single orchestrator (adding
explicit "your turn ends here" instructions), which also failed. The
split into separate skills is the structural solution.

### Scout server starts at Phase 0, not Phase 1

**The problem:** The Scout server takes 5+ minutes to boot. Starting it
at Phase 1 (after ticket analysis) wasted the entire analysis time.

**The solution:** `bug-reproduce` kicks off `node scripts/scout.js
start-server ... &` at the very beginning of Phase 0. All ticket
analysis, subagent research, and code reading happen while the server
boots. Phase 1 becomes a checkpoint: wait if no feature flags needed,
stop and restart with `config_sets/bug_fixer/kibana.yml` if flags are
required.

### Scout server instead of plain dev server

`bug-reproduce` uses `node scripts/scout.js start-server` (port 5620)
rather than `node scripts/kibana --dev`. The Scout server sets up the
`cloud-basic` auth provider required for
`auth_provider_hint=cloud-basic` login — the plain dev server does not.

### Parallel subagents for research phases

Both `bug-reproduce` (Phase 0) and `bug-fix` (Phase 4 Step 1) dispatch
multiple subagents in parallel rather than reading sources sequentially
in the main session. This is used in two distinct situations:

**Phase 0 — during server boot:**
While the Scout server is warming up (5+ minutes), the main agent
dispatches subagents in parallel to read each `similar_issue`, review
each `related_pr` diff, run closed-issue searches, and study
`affected_paths` source files. None of these tasks depend on each other,
so they all run simultaneously. By the time the server is ready, the
research is done.

**Phase 4 Step 1 — root cause analysis:**
Before presenting the fix plan, the agent dispatches subagents to review
prior fix patterns, map the full impact scope, search codebase
conventions, find all call sites, and locate existing tests. Again these
are independent tasks that benefit from parallelism.

**Why subagents rather than sequential reads in the main session:**

- **Context window preservation** — PR diffs and source files are large.
Reading them sequentially in the main session would fill the context
window with raw content, crowding out the conversation history and skill
instructions. Subagents read the content, synthesise it, and return only
a summary.
- **Context isolation** — Each subagent starts with a clean slate. It
cannot be biased by the main session's prior analysis or the agent's
forming hypothesis about the root cause. This is especially important
for the fix phase: a subagent reviewing a similar PR diff won't be
anchored to the main agent's pre-existing suspicion.
- **Parallelism** — Independent research tasks complete in the time of
the slowest one rather than the sum of all.

Subagents are not used for phases that require user interaction (Phase 3
reproduction report, Phase 4 plan approval, Phase 6 PR confirmation) —
those are interactive stops that must happen in the main session.

**Cursor limitation:** The `Agent` tool that spawns isolated parallel
subagents is specific to Claude Code. Cursor has no equivalent. When the
skill runs in Cursor, the agent falls back to reading those sources
sequentially in the main session — the workflow still completes
correctly, but without parallelism, without context isolation between
research tasks, and with large file contents accumulating in the main
context window rather than being summarised and discarded. Browser
reproduction (Phase 3) and all fix phases work identically in Cursor
since `cursor-ide-browser` is built in.

### Phase gates hardened against code-analysis shortcuts

Beyond the architectural split, several in-skill gates were added after
testing revealed specific bypass patterns:

- **Sequential execution preamble** in `bug-reproduce`: "Phase 0
analysis tells you where to look. Phase 3 browser reproduction tells you
what is actually broken. These are not the same thing."
- **Phase 2 hard gate**: server must return `available` at
`localhost:5620` before any environment setup begins.
- **Phase 3 reframe**: "Have I opened a browser and followed the
reproduction steps? If no, do that now before reading any further."
Names the exact failure mode: source code reading is not reproduction.
- **Phase 4 pre-check** in `bug-fix`: verify `reproduction-report.md`
has `status: reproduced` and `user_acknowledged: yes` before reading any
source file for fixing purposes. "The more obvious the bug seems from
code analysis, the more important this check is."
- **Pre-test self-check**: before creating any test file, verify an
explicit approval message exists in the conversation. "No exceptions for
bugs that seem obvious."

### `user_acknowledged` field protocol

The reproduction report includes a `user_acknowledged` field that must
be `yes` before fix work begins. Testing revealed agents would
self-write this field before the user replied. The skill now explicitly
states: "This field must only be written after a real user reply — never
pre-emptively. Writing it before the user responds is a protocol
violation."

### Scout skill invocation: explicit syntax and Security
Solution-specific reviewer

`bug-fix` Phase 4 Step 2 specifies the exact `Skill("name")` call syntax
for both scout skills, with full file paths to disambiguate between the
repo-root generic reviewer and the Security Solution-specific one:

1. `Skill("scout-create-scaffold")` —
`.agents/skills/scout-create-scaffold/SKILL.md`
2. `Skill("security-scout-best-practices-reviewer")` —
`x-pack/solutions/security/plugins/security_solution/.agents/skills/scout-best-practices-reviewer/SKILL.md`

The Security Solution reviewer internally runs the general
`scout-best-practices-reviewer` first — agents do not invoke it
separately.

### Skill improvement prompts — agents surface rule gaps after each
session

Both `bug-fix` and `bug-reproduce` end with a `## Skill Improvement`
section. After every session the agent checks for: new rationalizations
not covered by the Red Flags table, ambiguous phase rules, missing fix
strategies, test layer gaps, and undocumented environment conditions. If
any are found, it prompts the user before editing any skill file. This
mirrors the pattern from `security-scout-best-practices-reviewer`.

### Baseline failures and pressure scenario testing

The skill rules were validated using the TDD-for-documentation cycle
from `superpowers:writing-skills`:

- **RED phase**: three pressure scenarios run *without* the skill — one
failure found (plan approval: agent went into advisory mode instead of
presenting a formal plan ending with "Do you approve this plan as
written?")
- **GREEN phase**: same scenarios run *with* the skill — all three pass;
the plan approval failure is fixed; agents cite specific Red Flag
entries when refusing shortcuts
- **10 documented failures** in `baseline-failures.md` covering real
agent behaviour observed across four fixing sessions

---

## Potential follow-up: GitHub label-triggered automation

The skills in this PR are designed for interactive sessions. A natural
next step is a label-triggered workflow where adding `ai-reproduce` to a
GitHub issue kicks off reproduction automatically, and `ai-fix`
implements the fix — with the issue comments replacing the interactive
checkpoints.

### What would need to be built

**New infrastructure:**
- A GitHub Actions workflow triggered by label events (`ai-reproduce`,
`ai-fix`)
- A self-hosted runner with a full Kibana dev environment, Scout server,
and Playwright available — standard GitHub-hosted runners cannot boot
Kibana
- `claude` CLI invocation that passes the label event as context to the
agent

**Changes to the current skills:**
The two mandatory human checkpoints would need to be replaced:

| Current checkpoint | Automated equivalent |
|---|---|
| Human reads reproduction report and replies | Agent posts report as
issue comment; `ai-reproduce` label counts as acknowledgment |
| Human explicitly approves fix plan | Agent posts fix plan as issue
comment; `ai-fix` label counts as approval |

This requires a small change to `bug-fix` Phase 4 Step 1 — making plan
approval conditional on whether the session is interactive or
label-triggered — and a corresponding change to how `user_acknowledged`
is set in `bug-reproduce` Phase 3.

The core skill logic (Phases 0–7) stays unchanged. The label-triggered
mode is additive, not a rewrite.

---

## Test plan

- [ ] Ask agent to read and follow `bug-reproduce/SKILL.md` for a known
Security Solution issue — verify Scout server starts at Phase 0
- [ ] Verify agent presents reproduction report and stops without
proceeding to the fix
- [ ] Ask agent to read and follow `bug-fix/SKILL.md` — verify it reads
`analysis.json` + `reproduction-report.md` before doing anything else
- [ ] Verify fix plan is presented and agent waits for explicit approval
before writing any code
- [ ] Run `bug-fix` without reproduction files — verify agent says "read
and follow bug-reproduce first"
- [ ] Verify `user_acknowledged` is not written before a real user reply
- [ ] Verify agent invokes `security-scout-best-practices-reviewer` (not
generic) for Scout tests

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…1212)

## Summary

- Adds a validated `collapse` query option to workflow execution
listing.
- Forwards collapse to Elasticsearch while preserving existing filters,
sorting, and pagination.
- Covers route, service, and search-helper plumbing with focused tests.

## References

Closes elastic/security-team#17562

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
… auth.authenticator references (elastic#270771)

## Summary

  Resolves elastic/ingest-dev#7714.

When the OTLP Input integration is configured with bearer token
authentication via Fleet, the Elastic Agent enters a `DEGRADED` state.
The root cause is that Fleet's per-stream OTel config generator suffixes
extension keys for cross-stream uniqueness (e.g. `bearertokenauth` →
`bearertokenauth/<input-id>-<stream-id>`), but the matching *references*
to those extensions were left as bare names.

  Two reference sites were broken:

**1. `service.extensions` array** — spread verbatim from the stream
config, causing:
invalid configuration: service::extensions: references extension
"bearertokenauth" which is not configured
Fixed by adding `addSuffixToOtelcolServiceExtensions` and applying it
when building the per-stream `service` block.

**2. `auth.authenticator` inside component bodies** — the OTLP
receiver's protocol blocks (e.g.
`receivers.otlp.protocols.grpc.auth.authenticator: bearertokenauth`)
still pointed to the bare name, causing:
failed to resolve authenticator "bearertokenauth": authenticator not
found
Fixed by adding `rewriteOtelcolExtensionReferences`, a recursive walker
that rewrites `auth: { authenticator }` values using a per-stream
`originalToSuffixedExtensionIds` map. Only references matching
extensions
declared in the same stream are rewritten; external/pre-suffixed
references like `beatsauth/<outputId>` are left untouched.

  ## Testing

- Unit tests cover both fix paths and a negative case (external
authenticator references are preserved).
  - `node scripts/jest .../otel_collector.test.ts` — 59 tests pass.

You can manually test this by adding a OTLP package policy and verify
extension id is rewritten correctly and agent is working as expected

<img width="721" height="97" alt="Screenshot 2026-05-22 at 3 17 21 PM"
src="https://github.com/user-attachments/assets/37c53b38-f8c6-4918-932e-6044beeed763"
/>
<img width="683" height="247" alt="Screenshot 2026-05-22 at 3 17 12 PM"
src="https://github.com/user-attachments/assets/7dd1f9d7-89e5-42f5-b525-4a8c4e6ce7c4"
/>
<img width="730" height="155" alt="Screenshot 2026-05-22 at 3 16 59 PM"
src="https://github.com/user-attachments/assets/4eccb99c-4bb1-4eed-8277-6d60a3ca3d26"
/>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
elastic#271144)

## Summary

Doing the same changes over each discover FTR config to cut CI runtime:

- Add one `await esArchiver.loadIfNeeded('X')` in the index file's
`before` hook.
- Delete the per-child `loadIfNeeded('X')` calls.
- Delete any `esArchiver.unload('X')` in the index after hook and in
children.

Since we stop servers after FTR config is finished we are losing quite
some time unloading the data.

Some numbers:

- `loadIfNeeded` calls eliminated per CI run: 43 
- `esArchiver.unload(...)` calls removed: **57**
- Total esArchiver ops eliminated per CI run: ~99
…c#270571)

## Problem

When a watchlist is created with entity sources, or when a new entity
source is added to an existing watchlist, the watchlist index stays
empty until the scheduled background task runs (roughly every hour).
This means users have to wait or manually trigger a sync.

## Solution

This change adds a fire-and-forget sync call at the end of both create
routes. The sync runs in the background after the HTTP response is
returned so it does not affect response time. If the sync fails, the
error is logged as a warning but the watchlist creation still succeeds.

## Manual Testing

1. Start Kibana with a valid Elasticsearch cluster
2. Call POST /api/entity_analytics/watchlists with a body that includes
entitySources
3. Check server logs for the message "Background sync completed for
watchlist"
4. Verify the watchlist index is populated without waiting for the
scheduled task
5. Test the failure path by mocking syncWatchlist to throw and
confirming the API still returns 200

Closes elastic/security-team#17406

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Closes elastic/search-team#14414
 part of elastic/search-team#14205

## Summary

Enables users to create skills in chat with the help of an agent. Uses
the attachment UI, providing actions to users for previewing and saving
skills. Allows agents to iterate on drafts and make user requested
changes to produce multiple versions.

<img width="825" height="655" alt="image"
src="https://github.com/user-attachments/assets/1b1779fa-b627-4680-8422-bef22abfb81a"
/>
<img width="819" height="612" alt="image"
src="https://github.com/user-attachments/assets/c6e88b46-c2a9-46e4-80be-8828dec44d32"
/>

### Release note
Adds ability to create skills directly in agent builder chat.

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: Zachary Parikh <zachary.parikh@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Ryan Keairns <contactryank@gmail.com>
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
…9822)

## Summary

This PR migrates osquery to the V2 unified registry. No change to UI or
any existing behavior. Added api integration tests for legacy and
unified input.

**How to test**
Feature flag: `xpack.cases.attachments.enabled`
Flag on: attachment created as `cases-attachments` SO
Flag off: attachment created as `cases-comments` SO
Osquery added during flag on is hidden when turning the feature flag
off.


https://github.com/user-attachments/assets/6e637173-a4fa-4a7a-b9f6-f16bb29c9675


### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
)

## Summary

Negated indices (with leading `-`) are not part of the query, therefore
we don't need to check them. Beyond that, it breaks `_has_privileges` es
check.
## Summary

- Replace the OS checkbox group (with platform icons) on the pack query
flyout and saved-query form with an `EuiComboBox` multiselect labelled
"Operating systems". The combobox is required (no clear-all X,
validation error if emptied) and new queries seed `DEFAULT_PLATFORM` so
all three pills are pre-selected.
- In the pack queries table: remove the "Query" column, rename
"Platform" to "Operating systems", and render each OS as a hollow
`EuiBadge` pill instead of an icon.

<img width="1726" height="1327" alt="Screenshot 2026-05-19 at 3 51
25 PM"
src="https://github.com/user-attachments/assets/b6871c61-ccdf-4ecd-97af-2a05d529b6c8"
/>
<img width="1726" height="1323" alt="Screenshot 2026-05-19 at 3 51
42 PM"
src="https://github.com/user-attachments/assets/b53ce3c2-91a2-46ea-8814-e1104e7229ed"
/>
<img width="1724" height="1276" alt="Screenshot 2026-05-19 at 3 52
11 PM"
src="https://github.com/user-attachments/assets/964687b6-216e-4cc3-8382-7bee2305f1e1"
/>


## Test plan

- [ ] Pack edit → Attach next query: all three OS pills pre-selected, no
clear-X
- [ ] Remove all pills + click Save: validation error blocks submit
- [ ] Edit a query saved with empty platform: reopens with three pills
- [ ] Saved query form: same multiselect behaviour
- [ ] Pack edit queries table: no Query column; "Operating systems"
header; pill badges

Closes elastic/security-team#17022

---------

Co-authored-by: Tomasz Ciecierski <tomasz.ciecierski@elastic.co>
…endpoint navigation links with the correct capabilities (elastic#257966)

## Summary
If a user creates a role with "All" base privileges in the Kibana
privileges section, we expect the user to only have limited access to
the Endpoint Management section. Only global artifact management and
endpoint exceptions should be accessible. Full access requires explicit
enabling of the security sub-feature sections. Although the Privilege
Summary correctly shows that the user does not have access to pages like
the Endpoint List, policies and artifacts, when you click on the side
navigation panel or navigate to Security > Manage, these links were
still visible to the user.

This PR fixes how the base `ALL` and `READ` privileges handles SIEMV5
(Endpoint Management) features and correctly aligns the visibility of
the links to the Privilege summary. The API already correctly handled
restricting privileges, so this adjustment only affects the UI. The
`CUSTOMIZE` base privilege already works as expected.

- [x] Adds `excludeBasePrivileges` to the security sub-features
definition
- [x] Only Endpoint exceptions and global artifact management UI
features show when base privilege is `ALL`
- [x] Only Endpoint exceptions UI features show when base privilege is
`READ`

### To Test:
- Create a role in Stack Management and only adjust the Kibana
privileges section where the Space is `*All Spaces` and the Base
Privilege (Define Privileges section) is All.
- Sign into that user and observe that no asset management links are
visible.
- API Error toasts are expected

### Screenshots
<img width="769" height="675" alt="image"
src="https://github.com/user-attachments/assets/68097911-b9cd-4c40-bb4f-97b74b085fc7"
/>

Endpoint exceptions is set to READ when base privilege is READ
<img width="498" height="786" alt="image"
src="https://github.com/user-attachments/assets/2a1e06e2-97b2-416a-b6d7-dc837ef6b398"
/>

Side navigation only shows Artifacts > Endpoint exceptions and no other
links
<img width="1308" height="921" alt="image"
src="https://github.com/user-attachments/assets/6bd4aaf9-88a1-484e-ad43-73896e652efa"
/>

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Update serverless to auto enable WL feature flag

---------

Co-authored-by: Ying Mao <ying.mao@elastic.co>
…art-section errors (elastic#270371)

Closes: elastic#265117

Adds observability for non-render errors raised by the Metrics
Experience chart section (`useFetchMetricsData`, `useLensProps`,
`useMetricSourceKind`) through **two complementary sinks**:

- **APM (`@elastic/apm-rum`)** — primary sink. Every
fetch/build/classification failure that lands at one of the three call
sites is captured via `apm.captureError` with structured correlation
labels, and the surrounding transaction is marked failed via a
`chart-section-non-render-error` span (mirrors `lens/data_loader.ts`).
- **`@kbn/logging`** — wired into the package via
`ExternalServices.logger`. Currently surfaces APM transport failures
(the inner-catch around `apm.captureError`) tagged
`error_type=APMReportingFailure`. The plumbing (provider,
`useReportChartSectionError` hook, `log_labels.ts` inventory,
`logger_utils.ts` adapter) is in place so future code in this package
can emit structured logs against the same vocabulary without
re-introducing ad-hoc `console.error` calls.

Together this gives operators server-side aggregation (APM) for the
common error paths and a grep-able log signal for the rare cases where
APM itself fails.
### How to test

Add this inside `fetchSourceKind` in

`src/platform/packages/shared/kbn-unified-chart-section-viewer/src/components/flyout/hooks/use_metric_source_kind.ts`:

```ts
// `SMOKE_TEST` is widened to `boolean` so TS keeps the rest of the function
// reachable (otherwise dead-code analysis loses narrowing on `item`).
const SMOKE_TEST = true as boolean;
if (SMOKE_TEST) {
  throw new Error(`APM smoke test: useMetricSourceKind fetch for "${name}"`);
}
```

Then open Discover with a metrics data view and open a metric flyout —
that
mounts `useMetricSourceKind` and exercises the
`reportChartSectionError` → `apm.captureError` path.

**APM check** — service `kibana-frontend`, Errors tab, KQL
`labels.chart_section_source : "useMetricSourceKind"`:

<img width="1307" height="820" alt="image"
src="https://github.com/user-attachments/assets/959f808d-78b8-4be4-ba81-7bb11b07334c"
/>

**Logger check** — the package logger only runs in the `catch
(reportingError)`
fallback of `reportChartSectionError`. To exercise it, force
`apm.captureError` to throw from DevTools console:

```js
const original = window.elasticApm.captureError;
window.elasticApm.captureError = () => { throw new Error('forced APM failure'); };
// reproduce the action; then restore:
window.elasticApm.captureError = original;
```

You should see an `ERROR` entry in the browser console with context
`metrics-data-source-profile` and labels
`{ error_type: 'APMReportingFailure', chart_section_source:
'useMetricSourceKind' }`:

<img width="1490" height="214" alt="image"
src="https://github.com/user-attachments/assets/1938921d-930f-4320-8483-37463ded065b"
/>

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…ution public methods (elastic#269308)

## Summary

Refactor on top of elastic#268467 which introduced a `refresh` option on
`ResolutionClient.linkEntities`/`unlinkEntities`.

**What changed:**
- Replaces ES-vocabulary `refresh` with domain-named `awaitVisibility`
(default `false`) on the API
- Two-layer naming convention: the infra layer (`bulkUpdateEntityDocs`)
keeps ES vocabulary with a corrected default of `false`; the domain
layer exposes `awaitVisibility` and translates internally
- UI route handlers (`link`, `unlink`) pass `{ awaitVisibility: true }`
to get read-your-writes semantics after a user-triggered operation
- Background maintainer drop the now-redundant explicit `{ refresh:
false }` — the new default covers it

**Why:** `refresh` leaks Elasticsearch vocabulary into the domain layer
and requires callers to know what `'wait_for'` means. `awaitVisibility`
is self-documenting and hides the translation detail.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Low risk — pure internal rename refactor within the `entity_store`
plugin.

- No public API surface changed (all call sites are plugin-internal)
- No behavior change: same Elasticsearch semantics, same effective
defaults
- No deployment-mode divergence (stateful/serverless unaffected)
Dosant and others added 23 commits May 27, 2026 11:54
## Summary

Part of elastic/kibana-team#3344
Extracts the app header infrastructure from the Chrome Next integration
work in [elastic#259318](elastic#259318) into a
focused PR.


This adds:

- `@kbn/app-header` shared package with inline and Chrome-owned app
header rendering APIs.
- `chrome.next.appHeader.set()` plus internal state, lifecycle cleanup,
mocks, and layout wiring.
- Chrome-owned app header rendering in the Chrome Next project layout.
- Focused hardening for content detection, registration cleanup, legacy
badge fallback, and public type exports.
- Package README and targeted unit coverage for the new app-header
behavior.

This intentionally does not migrate any apps yet and does not pull in
unrelated Chrome Next slices such as side nav, user menu, feedback
handlers, or broader help menu changes.

## Context

The original integration branch includes app migrations and additional
Chrome Next features. This PR extracts only the app-header foundation so
it can be reviewed and merged independently before route-by-route
adoption.

Follow-up created:
[elastic#271295](elastic#271295) to make the
static “Add integrations” action access-aware.

## Risk

Low to medium. The new APIs are behind Chrome Next behavior and
currently have no app adopters in this PR, but the changes touch shared
Chrome layout state. Risk is mitigated with focused unit coverage and
existing Chrome validation checks.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…astic#271380)

## Summary

Adds an event lifecycle endpoint and timeline UI so users can trace the
full chain of detections → discoveries → verdicts → event versions when
clicking a significant event. Also introduces search and filter controls
on the events tab.

### Lifecycle
- New `GET /internal/sig_events/events/{id}/lifecycle` endpoint that
walks the event chain via `previous_event_id`, collects related
discoveries and verdicts in parallel, and deduplicates detections
- Flyout with event details, root cause, recommendations, evidences, and
a chronological lifecycle timeline

### Filters & search
- Added verdict, impact, and stream filter popovers to the events tab
- Added debounced text search
- Route accepts array-based query params for multi-select filters


https://github.com/user-attachments/assets/2a11830b-f726-45b2-b110-10810ccf63cf

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
## Summary

Registers the three KI workflows (features identification, queries
generation, onboarding) as managed workflows via the
`workflows_extensions` plugin, and delegates memory generation to Task
Manager.

### Managed workflow registration

- Adds three YAML workflow definitions under
`kbn-workflows/managed/definitions/streams_ki/`
- Registers the `streams` plugin as a managed workflow owner during
`setup()`
- Installs all three workflows as global (`spaceId: '*'`) during
`start()` with parallel installs
- Workflow IDs use the reserved `system-` prefix:
`system-streams-ki-features-identification`,
`system-streams-ki-queries-generation`, `system-streams-ki-onboarding`

### Memory generation endpoint

Changes `POST /internal/streams/{streamName}/memory/_generate` to
delegate to Task Manager:
- Returns `{ acknowledged: true }` immediately after scheduling a
`streams_memory_generation` task
- Uses the same singleton task pattern as the onboarding task, with
persistence, retry, and abort handling provided by Task Manager
- Eliminates request-scoped `inferenceClient` lifecycle concerns (the
task runner uses `fakeRequest` with a persisted API key)

## Test plan

- [x] Kibana starts and installs the three managed workflows without
errors
- [x] Managed workflows are accessible at
`/app/workflows/system-streams-ki-onboarding` (and the other two IDs).
They are not listed in the Workflows UI by default since the list
filters out managed workflows
- [x] Testing a managed workflow from the editor resolves child managed
workflows at runtime



https://github.com/user-attachments/assets/d54011a3-5014-445d-a38c-47a2fa9ea5bb

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary
`get_trace_change_points` uses a fixed_interval date histogram tied to
APM's rollup interval (minimum 1m). ES's `change_point `aggregation
requires at least 22 buckets, but a 15-minute window at 1m resolution
only produces ~7–15 buckets, causing every result to return
indeterminable: not enough buckets.
This is consistently triggered by the investigation skill passing the
screen context time range (often now-15m) to all tool calls.
### Fix
Enforce a 30-minute floor on the effective start time in the handler. If
the requested window is shorter than 30 minutes, start is silently
extended back to end - 30m. At 1m resolution, 30 minutes guarantees ≥22
buckets. Longer windows are unaffected.
## Summary

Contributes to elastic/docs-content#6591 by
adding the 9.4.2 Kibana release notes.

Preview -
https://docs-v3-preview.elastic.dev/elastic/kibana/pull/270302/release-notes

---------

Co-authored-by: Florent LB <florent.leborgne@elastic.co>
## Summary

Contributes to elastic/docs-content#6591 by
adding the 9.3.5 Kibana release notes.

Preview -
https://docs-v3-preview.elastic.dev/elastic/kibana/pull/270299/release-notes#kibana-9.3.5-release-notes

---------

Co-authored-by: Florent LB <florent.leborgne@elastic.co>
…se notes (elastic#270301)

## Summary

Contributes to
elastic/docs-content-internal#1223.

Updates known issue entry about how upgrading to 9.3.x fails when a rule
action contains oversized content. The workaround details have been
updated and resolution information has been added. Observability and
Security known issue release notes being updated via
elastic/docs-content#6645.

Preview -
https://docs-v3-preview.elastic.dev/elastic/kibana/pull/270301/release-notes/known-issues
…Builder:experimentalFeatures (elastic#270501)

## Summary

- Adds an `experimental` flag to `UiSettingsParams` as a mutually
exclusive alternative to `technicalPreview`. TypeScript enforces that a
setting can carry at most one maturity badge.
- Introduces a new `FieldTitleExperimentalBadge` component that renders
"Experimental" (instead of "Technical preview") in the Advanced Settings
UI, wired through `FieldDefinition` and `getFieldDefinition`.
- Switches `agentBuilder:experimentalFeatures` from `technicalPreview:
true` to `experimental: true` to align with updated Elastic terminology
guidelines ([Slack
thread](https://elastic.slack.com/archives/C0A2RUHDJCB/p1779223108141119)).

## Details

The existing `technicalPreview` field on `UiSettingsParams` was a plain
`interface` property. To enforce mutual exclusivity with the new
`experimental` field, `UiSettingsParams` is now a discriminated union:
one branch carries `technicalPreview` with `experimental?: never`, and
the other carries `experimental` with `technicalPreview?: never`.
TypeScript will error at compile time if both are set.

The new `experimental_badge.tsx` lives alongside the existing
`technical_preview_badge.tsx` in
`kbn-management/settings/components/field_row/title/`. `title.tsx`
renders whichever badge is applicable (at most one, by type constraint).

## Screenshots

### Current
<img width="1910" height="184" alt="Screenshot 2026-05-19 at 3 34 58 PM"
src="https://github.com/user-attachments/assets/745136b2-5b97-483e-93e4-bd1044346155"
/>


### Updated
<img width="941" height="130" alt="Screenshot 2026-05-21 at 2 47 42 PM"
src="https://github.com/user-attachments/assets/d06423e5-3e91-4517-9231-fa83cde8b7e9"
/>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…ing (elastic#270446)

**Epic:** elastic/security-team#12367
(internal)
**Resolves: elastic#262502**

## Summary

Instruments Security Solution's `DetectionRulesClient` (DRC) and some
API routes directly with rule changes history functionality. It also
involved AF's `RulesClient` instrumentation streamlining to facilitate
the implementation.

## Details

Instrumentation boils down to passing the rule changes history context
information down the road via the chain Security Solution API endpoint
-> `DRC` -> `RulesClient` -> `@kbn/changes-history` package. There are
two parameters passing from DRC which are

- change tracking `action`
It should reflect domain specific change action. From that POV we have
methods where the action is clear (as minimum for now) like `delete` or
`bulkDelete` and methods where action depends on the upstream context
like `create` or `update`. For example Security solution uses
`RulesClient.create()` for prebuilt rules management introducing domain
specific actions like prebuilt rules installation upgrade and etc.
- `metadata`
  Rule change tracking action related metadata.
  - `metadata.bulkCount`
Performance optimization in the consumer code like chunking makes it
impossible to capture the real number of rules the bulk operation is
applied to. Consumer code may pass `bulkCount` when it's necessary.
Besides that `bulkCount` is supported by some non-bulk methods as they
don't have bulk counterparts. For non-bulk methods with bulk
counterparts `bulkCount` isn't exposed.
  - `metadata.originalRuleSoId`
    Rule's Saved Object identifier saved upon rule duplication.

### Changes

**Alerting plugin / `@kbn/alerting-types`**
- `RuleChangeTracking` made generic (`RuleChangeTracking<ChangeAction
extends string = string>`) so consumers can restrict the `action` field
to their own enum without wrapping the type.
- `create_rule` and `update_rule` wired to accept `changeTracking?:
RuleChangeTracking` and log the action via `logRuleChanges`.
- `bulk_delete_rules` and `bulk_edit_rules` accept `changeTracking?:
Omit<RuleChangeTracking, 'action'>` — the action is implicit for these
operations; `bulkCount` is provided by the caller to track totals across
processing chunks.

**Security Solution common**
- New `common/detection_engine/rule_management/rule_change_tracking.ts`
introduces `SecurityRuleChangeTrackingAction` enum (`ruleInstall`,
`ruleUpgrade`, `ruleDuplicate`, `ruleImport`, `ruleRevert`) and
`SecurityRuleChangeTracking` type alias.

**Detection Rules Client**
- `IDetectionRulesClient` interface: all mutating methods accept
optional `SecurityRuleChangeTracking`.
- Each method passes `changeTracking` through to the underlying
`RulesClient` call. Methods with a fixed semantic (`importRule` →
`ruleImport`, `upgradePrebuiltRule` → `ruleUpgrade`,
`revertPrebuiltRule` → `ruleRevert`) always inject the correct default
action, allowing callers to supply `bulkCount` without overriding the
action.

**Security Solution API routes / handlers**
- `PUT /api/detection_engine/rules/_import` — passes `changeTracking: {
action: ruleImport }` to the DRC.
- `PUT /internal/detection_engine/prebuilt_rules/installation/_perform`
— passes `changeTracking: { action: ruleInstall, bulkCount }`.
- `PUT /internal/detection_engine/prebuilt_rules/upgrade/_perform` —
passes `changeTracking: { action: ruleUpgrade, bulkCount }`.
- `PUT /internal/detection_engine/prebuilt_rules/revert` — passes
`changeTracking: { action: ruleRevert }`.
- `PUT /api/detection_engine/rules/prepackaged` (legacy) — passes
`changeTracking: { action: ruleInstall, bulkCount }`.
- Integration paths (endpoint security and promotion rule installation)
pass `changeTracking: { action: ruleInstall, bulkCount }`
programmatically.

## How to test

This change is a no-op without explicit opt-in. To exercise the new code
paths locally:

1. Set
[FLAGS.FEATURE_ENABLED](https://github.com/elastic/kibana/blob/main/x-pack/platform/packages/shared/kbn-change-history/src/constants.ts#L31)
to `true` in **@kbn/change-history** package
2. 
3. Enable feature flags
```yaml
xpack.alerting.ruleChangeTracking.enabled: true

xpack.securitySolution.enableExperimental:
  - ruleChangesHistoryEnabled
```

4. Make changes
3a. Install one or more prebuilt rules (`PUT
/internal/detection_engine/prebuilt_rules/installation/_perform`). Open
a freshly installed rule and verify the changes history shows an entry
with action `rule_install`.
3b. Upgrade one or more prebuilt rules (`PUT
/internal/detection_engine/prebuilt_rules/upgrade/_perform`). Verify
changes history shows `rule_upgrade`.
3c. Revert a customized prebuilt rule (`PUT
/internal/detection_engine/prebuilt_rules/revert`). Verify changes
history shows `rule_revert`.
3d. Import a rule ndjson file via **Manage Rules → Import** (`PUT
/api/detection_engine/rules/_import`). Verify changes history shows
`rule_import`.

5. Make a request to `GET /internal/detection_engine/rules/_history` to
explore the change history for each rule you changed above
```bash
curl -H 'Content-Type: application/json' -H 'kbn-xsrf: kibana' -H "elastic-api-version: 1" -H "x-elastic-internal-origin: true" -u elastic:changeme 'http://localhost:5601/kbn/internal/detection_engine/rules/<rule_so_id>/history'
```
- Verify **FTR integration tests** added under
`x-pack/solutions/security/test/security_solution_api_integration/test_suites/detections_response/rules_management/rule_management/trial_license_complete_tier/change_tracking.ts`
pass.

### Identify risks

- Low risk: all `changeTracking` parameters are optional and additive.
Existing behavior is fully preserved when the parameter is omitted.
…c#270426)

## Summary

Closes elastic/search-team#14522

Adds per-provider EARS feature flagging via a two-tier system:
- **Stable providers** (Microsoft, Slack): enabled whenever
`xpack.actions.auth.ears.enabled: true`
- **Experimental providers** (Google): only enabled when *both*
`ears.enabled: true` **and** `ears.enableExperimental: true`

This allows us to ship EARS for verified OAuth providers while keeping
unverified ones (Google, pending app verification) available only for
internal dogfooding.

### How it works

- Each connector spec's EARS auth type entry can declare `experimental:
true` (Google Calendar, Gmail, Google Drive do this)
- A new `xpack.actions.auth.ears.enableExperimental` boolean config
controls whether experimental EARS providers are available
- The filtering happens at schema generation time
(`generateSecretsSchemaFromSpec`), so both the UI and API are gated
- Existing EARS connectors for experimental providers show as disabled
in the connectors table when `enableExperimental` is off

### Promotion flow

When Google's OAuth app verification completes:
1. Remove `experimental: true` from the 3 Google specs (one-line diff
each)
2. **No deployment config changes needed** — Google EARS "just works"
for everyone

### Config

```yaml
# kibana.yml
xpack.actions.auth.ears:
  enabled: true                # global EARS gate (existing)
  enableExperimental: true     # opt-in for unverified providers (new)
```

### Changes

| Area | What |
|------|------|
| `kbn-connector-specs` | `AuthTypeDef.experimental` flag, filtering in
`generateSecretsSchemaFromSpec`, `isEarsExperimentalConnector` helper |
| `actions` plugin | `ears.enableExperimental` config,
`isEarsExperimentalEnabled()` utility, exposed to browser |
| `stack_connectors` | Thread `isEarsExperimentalEnabled` through
client-side schema generation |
| `agent_builder` | Per-provider disabled check in connectors table |
| `triggers_actions_ui` | Per-provider disabled check in connectors list
|
| Google specs | `experimental: true` on EARS auth type
(google_calendar, gmail, google_drive) |

## Test plan

- [ ] With `ears.enabled: true` and no `enableExperimental`:
Microsoft/Slack connectors show EARS option, Google connectors do not
- [ ] With `ears.enabled: true` and `enableExperimental: true`: all
connectors show EARS option
- [ ] With `ears.enabled: false`: no connectors show EARS option
regardless of `enableExperimental`
- [ ] Creating a Google EARS connector via API fails when
`enableExperimental` is off
- [ ] Previously created Google EARS connectors show as disabled when
`enableExperimental` is turned off
- [ ] Unit tests pass: `node scripts/jest
src/platform/packages/shared/kbn-connector-specs/`
- [ ] Unit tests pass: `node scripts/jest
x-pack/platform/plugins/shared/actions/server/`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
)

## Summary

> [!NOTE]
> This only contains the server side changes for the entity behavior
feature. UI changes to come in subsequent PRs.

Introduces a new entity maintainer (`ml-anomaly-detection-jobs`) that
maintains the `entity.behaviors.anomaly_job_ids` field for an entity in
the entity store. This maintainer runs every 24 hours and looks back 90
days in order to capture all of the anomalous behavior for an entity in
the last 90 days.

During each run, the maintainer:
- Iterates over user and host entities from the entity store in batches
- For each batch, fetches anomaly records from security ML jobs for the
last 90 days and above the configured threshold minimum
- If anomaly records exist for an entity, its entity store entry is
updated to include the anomaly job ID.
- If anomaly records exist for an entity, additional supporting details
will be queried and stored in a details datastream
`.entity_analytics.ml-ad-jobs-latest-${namespace}`

The additional details that are fetched are job dependent:

For jobs that use the `rare` function (for example, rare country login),
only the anomalous value is stored in the anomaly record (for example,
`Iran`). In order to determine the baseline behavior for the entity, we
use the ML job configuration to aggregate against the source index (for
example, an aggregation to determine where an entity commonly logs in)

For other job types that are metric or count functions (for example,
high number of failed logins), the record document contains the typical
value and the anomalous value so we already have the baseline behavior.

For all job types, we grab the latest 3 anomalous documents. This is to
support the "Raw Evidence" portion of the expanded section in the
initial UI mockups. Note that the exact format of these documents may
change as we finalize the mockups but since this feature is behind a
feature flag, it should be ok to merge and finalize later.

<img width="463" height="374" alt="Screenshot 2026-05-18 at 4 12 06 PM"
src="https://github.com/user-attachments/assets/19af8c51-650d-40b2-8b9e-548daee8ac5e"
/>

## To Verify

1. Modify the default lookback period of the entity store logs
extraction task (because we're populating historical data)

```
--- a/x-pack/solutions/security/plugins/entity_store/server/domain/saved_objects/global_state/constants.ts
+++ b/x-pack/solutions/security/plugins/entity_store/server/domain/saved_objects/global_state/constants.ts
@@ -10,14 +10,14 @@ import { z } from '@kbn/zod/v4';
 export const DEFAULT_HISTORY_SNAPSHOT_FREQUENCY = '24h';

 export const LOG_EXTRACTION_DELAY_DEFAULT = '1m';
-export const LOG_EXTRACTION_LOOKBACK_PERIOD_DEFAULT = '3h';
+export const LOG_EXTRACTION_LOOKBACK_PERIOD_DEFAULT = '30d';
 export const LOG_EXTRACTION_FREQUENCY_DEFAULT = '1m';
 // Max amount of entities to extract in one ESQL query
 export const LOG_EXTRACTION_DOCS_LIMIT_DEFAULT = 10000;
 // Max raw log documents per logs to be processed in a query (inside elastic search)
 export const LOG_EXTRACTION_MAX_LOGS_PER_PAGE_DEFAULT = 40000;
 export const LOG_EXTRACTION_TIMEOUT_DEFAULT = '59s';
-export const LOG_EXTRACTION_MAX_TIME_WINDOW_SIZE_DEFAULT = '15m';
+export const LOG_EXTRACTION_MAX_TIME_WINDOW_SIZE_DEFAULT = '1d';
 // Max total raw log documents to process per task run; 0 = no cap
```

2. Start ES and Kibana with the following feature flags:

```
uiSettings.overrides:
  securitySolution:entityStoreEnableV2: true

xpack.securitySolution.enableExperimental:
  - entityAnalyticsEntityStoreV2
  - entityAnalyticsWatchlistEnabled
  - entityAnalyticsNewHomePageEnabled
  - leadGenerationEnabled
  - entityAnalyticsMlJobBehaviorMaintainer    ---->> !!! NEW FEATURE FLAG FOR THIS PR !!!
```

3. Use this script to populate some data:
https://gist.github.com/ymao1/d35d356f090e23c746055446cc21fba0. NOTE!!:
You may need to modify the Kibana URL if you're using a different base
path or SSL

You will need to also download these scripts that are referenced by the
above script.
- Rare region data:
https://gist.github.com/ymao1/3f8d1214928b5c27aa505a20b7f2425d
- High login count:
https://gist.github.com/ymao1/fbbdbcf7552455fd155ee52ffcddf67a

4. Verify the maintainer is started in Dev Tools

```
GET kbn:/internal/security/entity_store/entity_maintainers?apiVersion=2
```

Response should include the new `ml-anomaly-detection-jobs` maintainer
and the status should be `started`

```
{
  "maintainers": [
    {
      "id": "ml-anomaly-detection-jobs",
      "taskStatus": "started",
      "interval": "1d",
      "description": "Entity Analytics ML Anomaly Detection Maintainer",
      "nextRunAt": "2026-05-19T12:30:27.957Z",
      "minLicense": "platinum",
      "customState": {},
      "runs": 1,
      "lastSuccessTimestamp": "2026-05-18T12:30:30.117Z",
      "lastErrorTimestamp": null
    },
  ]
}
```

5. Manually run the maintainer

```
POST kbn:/internal/security/entity_store/entity_maintainers/run/ml-anomaly-detection-jobs?apiVersion=2
```

You should see this info log when the maintainer is done:

```
[2026-05-19T17:45:00.929-04:00][INFO ][plugins.securitySolution.ml-anomaly-detection-jobs-default] Maintainer run completed in 2570ms
```

6. After the maintainer runs, you should see some entities populated
with behavior data

```
GET .entities.v2.latest.security_default-00001/_search
{
    "query": {
        "bool": {
            "filter": [
              {
                "exists": {
                  "field": "entity.behaviors.anomaly_job_ids"
                }
              }
            ]
        }
    }
}
```

and you should see entries in the details index

```
GET .entity_analytics.ml-ad-jobs-latest-default/_search
```

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…stic#271250)

## Summary

Doing the same changes over each ML FTR config to cut CI runtime:

- Add one `await esArchiver.loadIfNeeded('X')` in the index file's
`before` hook.
- Delete the per-child `loadIfNeeded('X')` calls.
- Delete any `esArchiver.unload('X')` in the index after hook and in
children.

Since we stop servers after FTR config is finished we are losing quite
some time unloading the data.

Some numbers:

- `loadIfNeeded` calls eliminated per CI run: 46 
- `esArchiver.unload(...)` calls removed: **27**
- Total esArchiver ops eliminated per CI run: ~ 73

Since each FTR config gets its own fresh ES+Kibana instance, none of
afterAll` top level hooks have any effect on subsequent configs. Saving
time by removing it and related calls:
- `ml.securityUI.logout()` — browser session is killed with the server
anyway
- `ml.securityCommon.cleanMlUsers/Roles()` — ES security objects
destroyed with the server
- `ml.testResources.resetKibanaTimeZone()` — Kibana instance destroyed
with the server
- `esNode.unload() (anomaly_detection_jobs group1–4)` — also redundant
(same reasoning as our earlier work)
…c#271425)

## Summary

Fix typo in the smoke-tests evaluation suite path.

Details:
- Smoke tests were added as part of
elastic#271249
- Due to the last minute refactor (renaming suite directory), this
change slipped through the cracks; auto-merge on CI success happened
because the suite never ran. Root cause: The pipeline's
`readEvalsSuiteMetadata()` function silently filters out any suite whose
config file doesn't exist in the git tree. So despite the
`evals:smoke-tests` label being present and the defaultModelGroups being
correctly configured, the suite was being dropped before label matching
ever happened.
## Summary

After elastic#260835, the `i18n.locale` setting became deprecated and replaced
with `i18n.defaultLocale`. This PR updates the `kibana.yml` to reference
the current setting name.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [X] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- ~~[ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials~~
- ~~[ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios~~
- [X] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- ~~[ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.~~
- ~~[ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed~~
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- ~~[ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.~~

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
)

## What & Why

Improves the YAML template editor experience across three areas:

1. **Visual diff highlighting** — Adds gutter decorations so users can
see which lines changed since the last save (similar to the Workflows
editor). Fixes the "unsaved changes" badge incorrectly appearing after
saving and re-opening a template, caused by stale local storage drafts
not being cleared on save and a race condition where `form.reset`
overwrote draft values.

2. **Better schema validation errors** — Converts the field-level
`oneOf`/`anyOf` union in the generated JSON Schema into `if`/`then`
chains keyed on the `control` property. This makes Monaco YAML produce
contextual errors (e.g., "type must be 'long' | 'integer' | ...")
instead of the confusing "control must be INPUT_TEXT | SELECT_BASIC |
...". Also fixes `addDiscriminatorEnumHints` to correctly extract values
from unions of literals (not just `const`), so all valid type options
for `INPUT_NUMBER` fields appear in autocomplete.

3. **Server-side definition validation on save** — The POST/PUT/PATCH
template routes previously only checked that the YAML was syntactically
parseable (`yaml.load()`), allowing semantically invalid templates
(e.g., `type: keyword` on an `INPUT_NUMBER` field) to be persisted. Now
validates the parsed YAML against `ParsedTemplateDefinitionSchema`
before saving and returns a `400 Bad Request` with specific Zod
validation issues if the definition is invalid.

Additionally, improves autocomplete tooltip labels for the `fields`
property to show descriptive field type names (e.g., "Text Input",
"Select") instead of generic "object".

## How to Test

**Diff highlighting & unsaved changes badge:**
1. Start Kibana, go to Cases > Templates, open an existing template for
editing.
2. Change a line (e.g., the `name` field) — verify a yellow gutter
marker appears on the changed line and the "Unsaved changes" badge shows
in the header.
3. Navigate away (back to templates list), then navigate back — verify
the draft is preserved and diff/badge still show.
4. Click Save — verify you're redirected to the list. Re-open the same
template — verify no badge and no gutter markers.
5. Go to Create Template, make edits, save — verify re-opening Create
Template starts fresh with the example definition (no badge).

**Schema validation & autocomplete:**
6. In the YAML editor, type `fields:` and trigger autocomplete on a
field entry — verify the suggestion labels show descriptive names (e.g.,
"Text input", "Select") instead of "object".
7. Add a field with `control: INPUT_NUMBER` and `type: integer` — verify
no validation error appears.
8. Change `type: integer` to `type: keyword` — verify the error message
references `type` (not `control`), listing valid numeric types.

**Server-side validation:**
9. Attempt to save a template with an invalid field definition (e.g.,
`type: keyword` with `control: INPUT_NUMBER`) — verify the save fails
with a toast error (400 Bad Request) and the invalid template is not
persisted.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
)

Closes elastic#260667

## Summary

Aligns Metrics in Discover `METRICS_INFO` failures with main Discover
search errors by replacing the custom `MetricsInfoError` component with
Discover’s shared `ErrorCallout` (via a `ChartSectionSearchError`
wrapper).

ES|QL error handling is centralized under `src/common/errors/` so
Metrics and Traces can reuse the same path, including HTTP 200 responses
with an embedded Elasticsearch error body. Discover injects
`showErrorDialog` and `esqlReferenceHref` from the metrics profile
wrapper—the same pattern as `discover_layout.tsx` after
[elastic#261332](elastic#261332).


### Changes

- **Error handling**
- Moved `esql_response_error` to `src/common/errors/` and improve
`formatErrorCause` (all `root_cause` entries, `caused_by` fallback)
  - Added `normalizeChartSectionSearchError`
- Update `execute_esql_query` and `report_chart_section_error` imports
to the shared module
- **UI**
- Added `ChartSectionSearchError` wrapping `@kbn/discover-utils`
`ErrorCallout`
- Metrics Experience Grid now render `ChartSectionSearchError` on `|
METRICS_INFO` failure
  - Removed `metrics_info_error.tsx`
- **Discover host wiring**
- `chart_section.tsx` passes `chartSectionSearchError` with
`core.notifications.showErrorDialog` and
`docLinks.links.query.queryESQL` (same behaviour as main Discover
`ErrorCallout`)
- Added `ChartSectionSearchErrorHostProps` to `UnifiedMetricsGridProps`
- **i18n**
  - Remove unused `metricsExperience.metricsInfoError.*` keys
  - Add `metricsExperience.chartSectionError.title`

### Expected Results

We're now able to see Discover's error component on a METRICS_INFO call
error (Error description is custom for the demonstration)
<img width="1584" height="709" alt="image"
src="https://github.com/user-attachments/assets/81b7f380-8b07-4d36-b732-de7742c661f7"
/>

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
)

## Summary

Implements part of elastic/rna-program#430

This PR introduces `isEsqlUserError`: a small predicate that returns
`true` for `ResponseError` with status 400 or 404, and applies it in two
places:

- **Main rule query** (`QueryService.executeQueryStream`): on a user
error, wraps the thrown error with `createTaskRunError(...,
TaskErrorSource.USER)` before rethrowing. Non-user errors (5xx, network,
cancellation, Arrow parse errors) are rethrown unchanged.
- **Recovery query** (`CreateRecoveryEventsStep`): same wrapping applied
to the `recovery_policy.type === 'query'` execution path.

---------

Co-authored-by: Christos Nasikas <xristosnasikas@gmail.com>
## Summary
I noticed that we often approve the auto-generated PR, but we sometimes
forget to enable merge/auto-merge, so the PR sits there and never
merges. When that happens, the next weeks’ PRs don’t get generated, and
the whole weekly chain stalls. Enabling auto-merge at PR creation time
prevents this from being blocked by a missed manual step.
This change updates
`.buildkite/scripts/steps/console_definitions_sync.sh` so that after
creating the Console definitions sync PR it:

* Automatically enables auto-merge (squash) for that PR via gh pr merge
--auto --squash.
* Logs a warning if enabling auto-merge fails, without failing the step.
…70607)

Adds an inline tool for the Agent Builder Automatic Troubleshooting
skill `get_endpoint_artifacts`. This tool allows the agent to retrieve
endpoint specific exception list items such as endpoint exceptions,
trusted apps, blocklists, etc. The tool has a summary and detail mode to
help prevent context explosion from artifacts.

In order to support user scoped artifact fetching, a new
`getScopedEndpointArtifactClient` was also added to the endpoint app
context service as the existing `getExceptionListsClient` is not user
scoped.

Also includes some minor skill instructions tweaking to better handle
endpoint artifacts.
## Summary

- Updates the `search` rollback fixtures for model version 13
(`10.13.0.json`) to use `{ "$match": "uuid" }` for Discover session tab
IDs instead of hardcoded UUIDs.
- MV13 tab IDs are generated via `uuidv5(savedObjectId, …)` during
migration, but rollback tests bulk-create documents without fixed IDs,
so the tab ID changes on every CI run and caused false fixture
mismatches on unrelated PRs.
- Adds a **Saved object fixtures** section to `.github/CODEOWNERS`,
assigning each `__fixtures__/<type>/` folder to the team that owns the
corresponding registered SO type (derived from the registering plugin's
`kibana.jsonc` owner or more specific CODEOWNERS paths).

## Test plan

- [x] `node scripts/check_changes.ts`
- [ ] CI: **Check changes in Saved Objects** rollback tests for `search`
pass when MV13 is in scope

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Davis McPhee <davismcphee@hotmail.com>
…format (elastic#270927)

## Summary

Updates `docs/extend/saved-objects/validate.md` to reflect the
structured error reporting introduced in elastic#268469 and the additional
validation rules added in subsequent PRs.

### What changed

**Format**: The CI check now posts a structured PR comment
(`**[rule-id]** Message. _Fix:_ …`) instead of raw `❌` terminal output.
The troubleshooting intro is updated to explain the new format and show
how to reproduce findings locally.

**New rules documented** (were missing from the original section):

| Rule ID | Introduced in |
|---|---|
| `existing-type/schema-breaking-changes` | elastic#268630 |
| `existing-type/schema-undiffable-legacy-hash` | elastic#268630 |
| `existing-type/new-mappings-not-in-model-version` | elastic#268630 |
| `existing-type/keyword-missing-ignore-above` | elastic#268630 |
| `existing-type/invalid-name-title-field-type` | elastic#268630 |
| `new-type/missing-initial-model-version` | elastic#268469 |
| `new-type/legacy-migrations` | elastic#268469 |
| `new-type/keyword-missing-ignore-above` | elastic#270541 |
| `new-type/invalid-name-title-field-type` | elastic#270541 |
| `model-version/mappings-not-in-schema` | elastic#268630 |
| `model-version/mapping-index-false` | elastic#268630 |
| `model-version/mapping-enabled-false` | elastic#268630 |
| `model-version/fixture-missing` | elastic#270541 |
| `model-version/fixture-invalid` | elastic#270541 |
| `documents/fixture-mismatch` | elastic#270541 |

**Structure**: Rules are now grouped into categories (existing-type /
new-type / model-version / documents / removed-type) with stable anchor
IDs on every rule heading, so `([docs](link))` references in PR comments
land directly on the right entry.

## Test plan

- [ ] Visual review of rendered markdown in the Elastic docs preview


Made with [Cursor](https://cursor.com)

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
This package is used by search playground

Upgrades `ai` from 5.0.102 → 5.0.190 and `@ai-sdk/langchain` from
1.0.102 → 1.0.190, which transitively bumps `@ai-sdk/provider-utils`
from 3.0.17 to 3.0.25. Adds a yarn resolution to force the updated
version for all transitive consumers (including @arizeai/phoenix-client
which pins ai@^5.0.38).

## Testing
- Typecheck passes
- Playground unit tests passing
- confirm no older version is present: `find node_modules -path
"*/provider-utils/package.json" -exec grep '"version"' {} \; -print`
- Manually navigated search-playground in affected stack version,
verified no breaking functionality changes:
	- Connected an index
	- Asked a question and confirmed api call returns a stream-event
	- Asked a follow-up to verify conversation history is maintained

## Backport
- 9.3, 9.2 and 8.19 have the same format and automated backport should
work fine
- 9.1 has a different patch version (5.0.108) and manual backport will
be created if necessary

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
…es.yml (elastic#271321)

## Summary

Mirrors the index privilege changes from
[elasticsearch-controller#1777](elastic/elasticsearch-controller#1777)
(merged 2026-05-22 by @ymao1) into the Kibana serverless roles file.

Two changes:

- **Viewer role**: adds `read` on `.entity_analytics.entity-leads*` and
`.entity_analytics.watchlists.*` (watchlists + entity leads visibility
for read-only users)
- **Asset-criticality write roles**: adds `view_index_metadata` on
`.entities.v2.latest.security_*` for all roles that already have `write`
on `.asset-criticality.asset-criticality-*`. Affected: `editor`,
`platform_engineer`, `t2_analyst`, `t3_analyst`,
`threat_intelligence_analyst`, `rule_author`,
`endpoint_operations_analyst`, `endpoint_policy_manager`.

Context: @simitt flagged the requirement to mirror controller changes
into this file during controller PR review. The mismatch is not enforced
at runtime but the file header explicitly states it should stay in sync.


Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown
Contributor

@smith, this PR increases one or more page-load bundle sizes by 15% or more:

Plugin Before (bytes) After (bytes) Change
agentBuilderPlatform 8,737 15,544 +77.9%

Large bundle size increases can affect page load performance. Consider whether dependencies can be lazy-loaded or code split to reduce the bundle.

See the bundle optimization guide for tips.

@smith smith added the ci:project-deploy-observability Create an Observability project label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@smith smith added Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:project-deploy-observability Create an Observability project Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) release_note:skip Skip the PR/issue when compiling release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.