Skip to content

Commit 58a2bbb

Browse files
authored
Merge branch 'main' into fix/cspm-gcp-flaky-tests
2 parents a27c157 + ba92b19 commit 58a2bbb

156 files changed

Lines changed: 3443 additions & 709 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/flaky-test-investigator/SKILL.md

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,6 @@ For every failure, try to retrieve:
4242
- **Server logs** (`kibana.log`, `elasticsearch.log` when present). Cross-reference the failure timestamp with any errors in the logs — a server-side 500 or unexpected warning is strong evidence the failure is a product bug, not a test bug.
4343
- **Full session trace** when the framework supports it (Scout / Playwright). Lets you scrub through every step, locator query, network call, and DOM snapshot.
4444

45-
How to actually find and download each artifact type is framework-specific — see "Retrieve failure artifacts" below.
46-
4745
Things to specifically check in the artifacts before forming a root-cause hypothesis:
4846

4947
- **Did the expected element render at all?** If yes and the selector missed it → flaky selector (Tier 2 fix territory). If no → real rendering / race / data issue (Tier 1 territory).
@@ -53,27 +51,9 @@ Things to specifically check in the artifacts before forming a root-cause hypoth
5351

5452
If artifacts are not available (expired, not uploaded, no `read_artifacts` token), say so in the report rather than fabricating a hypothesis. "Screenshot would have resolved this; not available" is a valid open question.
5553

56-
### Retrieve failure artifacts
57-
58-
The standard recipe is **list → filter by path → download by ID**, always scoped to the failed job's UUID. Two Buildkite gotchas to know about first:
59-
60-
- **Failed-attempt jobs are hidden by default.** `/builds/<n>` returns only the latest attempt; append `?include_retried_jobs=true` to find the original failing job (the one cited in `failed-test` comments). `retried` and `retried_in_job_id` link the two.
61-
- **Per-job artifacts use a different endpoint than build-wide artifacts.** If a build retried to green, failure artifacts only live on the failed job's listing (`bk artifacts list <build> -p <pipeline> --job-uuid <jobId>`). Don't conclude "no screenshot uploaded" until you've checked there.
62-
63-
**Scout** (`@kbn/scout-reporting`, not standard Playwright output — `playwright-report/`, `trace.zip`, and video are NOT published):
64-
65-
- `.scout/reports/scout-playwright-test-failures-<runId>/test-failures-summary.json` — maps test name → HTML report. Start here.
66-
- `.scout/reports/scout-playwright-test-failures-<runId>/<testId>.html` — self-contained: error, stdout, embedded screenshot. Usually sufficient on its own.
67-
- `.scout/reports/scout-playwright-test-failures-<runId>/scout-failures-<runId>.ndjson` — one record per failure (`id` = `<testId>`, `owner`, `location`, `error.*`) for programmatic use.
68-
- `**/.scout/test-artifacts/<test-slug>/test-failed-<N>.png` — plain Playwright screenshot; the PNG doesn't carry `<testId>`, so correlate via spec path.
69-
70-
**FTR** (a single content `<hash>` links every artifact for one failure):
71-
72-
- `target/test_failures/<jobId>_<hash>.{json,log,html}``.json` is source of truth; full Kibana/ES stdout lives in `system-out` (there is no separate `kibana.log`). Pull this first.
73-
- `<test-root>/screenshots/failure/*-<hash>.png` and `<test-root>/failure_debug/html/*-<hash>.html` — UI tests only; fetch only when the failure is UI-side.
74-
- `.es/*.log` — transport/cluster-shaped failures.
54+
### List failure artifacts
7555

76-
`target/test_failures/` is shared with Scout; filter by `.jobName` (e.g. `FTR Configs #90` vs `Scout Lane #12`) to keep only FTR. On Cloud FTR pipelines the layout differs: one self-contained HTML per failure at `<config-path-with-underscores>-<unix-timestamp>/html/<contentHash>.html` — no `target/test_failures/`, screenshot, or DOM artifacts.
56+
`bk artifacts list <build> -p <pipeline> --job-uuid <jobId> --json` returns a JSON listing of every artifact uploaded for the failing job. Pass `--job-uuid <jobId>` for the failed attempt (without it, `bk` only returns the latest attempt and hides retried failures). If a build retried to green, failure artifacts only live on the failed job's listing; don't conclude "no screenshot" until you've scoped to the right job UUID.
7757

7858
### Understand the scope
7959

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,7 @@ src/core/packages/user-profile/server-mocks @elastic/kibana-core
350350
src/core/packages/user-settings/server @elastic/kibana-security
351351
src/core/packages/user-settings/server-internal @elastic/kibana-security
352352
src/core/packages/user-settings/server-mocks @elastic/kibana-security
353+
src/core/packages/user-settings/types @elastic/kibana-security
353354
src/core/packages/user-storage/browser @elastic/appex-sharedux
354355
src/core/packages/user-storage/browser-internal @elastic/appex-sharedux
355356
src/core/packages/user-storage/browser-mocks @elastic/appex-sharedux
@@ -3543,6 +3544,7 @@ x-pack/solutions/observability/plugins/synthetics/server/saved_objects/synthetic
35433544
/src/cli/ @elastic/kibana-operations
35443545
/src/cli_keystore/ @elastic/kibana-operations
35453546
/.github/workflows/ @elastic/kibana-operations
3547+
/.github/workflows/failed-test-investigator.md @elastic/kibana-operations @elastic/appex-qa
35463548
/.github/aw/ @elastic/kibana-operations
35473549
/.buildkite/ @elastic/kibana-operations
35483550
/moon.yml @elastic/kibana-operations

.github/workflows/failed-test-investigator.lock.yml

Lines changed: 45 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.github/workflows/failed-test-investigator.md

Lines changed: 57 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ concurrency:
2727

2828
env:
2929
ISSUE_NUMBER: &issue_number ${{ github.event.issue.number || github.event.inputs.issue_number }}
30+
# Lets the agent omit `-o elastic` on every `bk` invocation (see https://buildkite.com/docs/pipelines/configure/environment-variables)
31+
BUILDKITE_ORGANIZATION_SLUG: elastic
3032

3133
engine:
3234
id: claude
@@ -53,13 +55,36 @@ network:
5355
- defaults
5456
- buildkite.com
5557
- '*.buildkite.com'
58+
- buildkiteartifacts.com
5659
- ci-stats.kibana.dev
5760
- github.com
5861
- api.github.com
5962
- chatgpt.com
6063
- elastic.litellm-prod.ai
6164
sandbox:
6265
agent: awf # Migrated from deprecated network setting
66+
steps:
67+
- name: Install Buildkite CLI and export BUILDKITE_API_TOKEN
68+
env:
69+
BK_VERSION: 3.44.0
70+
BK_SHA256: 88867c0b983ad2afe1efc26f0df6b46b5673577c1aea95eba76992636fb9abe9
71+
OPS_BUILDKITE_TOKEN: ${{ secrets.OPS_BUILDKITE_TOKEN }}
72+
run: |
73+
set -euo pipefail
74+
tmp="$(mktemp -d)"
75+
url="https://github.com/buildkite/cli/releases/download/v${BK_VERSION}/bk_${BK_VERSION}_linux_amd64.tar.gz"
76+
curl -fsSL --retry 3 --retry-delay 2 "${url}" -o "${tmp}/bk.tgz"
77+
echo "${BK_SHA256} ${tmp}/bk.tgz" | sha256sum -c -
78+
tar -xzf "${tmp}/bk.tgz" -C "${tmp}" --strip-components=1 "bk_${BK_VERSION}_linux_amd64/bk"
79+
install -d "${RUNNER_TEMP}/gh-aw/mcp-cli/bin"
80+
install -m 0755 "${tmp}/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk"
81+
"${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" --version
82+
if [ -z "${OPS_BUILDKITE_TOKEN:-}" ]; then
83+
echo "::error::OPS_BUILDKITE_TOKEN secret is not set" >&2
84+
exit 1
85+
fi
86+
echo "BUILDKITE_API_TOKEN=${OPS_BUILDKITE_TOKEN}" >> "${GITHUB_ENV}"
87+
6388
safe-outputs:
6489
noop:
6590
report-as-issue: false
@@ -89,7 +114,7 @@ Investigate a failed-test issue, classify the failure, and propose a fix when ap
89114

90115
## Investigate
91116

92-
Investigate the test failure(s) using the `flaky-test-investigator` skill.
117+
Investigate the test failure(s) using the `flaky-test-investigator` skill. Use all of the data at your disposal to reach a conclusion (source code, logs, failure screenshots, etc.).
93118

94119
Every conclusion must cite specific evidence. Do not guess.
95120

@@ -139,32 +164,42 @@ No other side-effects beyond posting the comment and updating the label.
139164

140165
## Comment format
141166

142-
Post exactly one comment. Keep the visible portion very short and easy to read:
167+
Post exactly one comment on the issue. Keep it concise, actionable, and prioritize the most critical findings at the very top. Adapt the sections below to best fit the specific failure. **Use `####` for all subsections** (e.g., `#### Proposed Fix`, `#### Root Cause`).
168+
169+
Do not create standalone sections for "what the test does" "evidence," "where the test ran," or "failure screenshot". Integrate these details seamlessly into the sections below if they add value. Do not also mention why the `ai:auto-flaky-fix` isn't added.
170+
171+
### 1. The TL;DR (Required)
172+
173+
Start with a clear heading, essential metadata, and a brief summary of the failure, followed by a horizontal rule.
174+
175+
```
176+
## {Classification}: {One-line description of what broke}
177+
178+
## **Classification:** {type} | **Confidence:** {level} | **Introduced by:** {commit/PR if known}
179+
180+
**Summary:** One or two sentences explaining the exact failure point.
181+
```
182+
183+
### 2. Proposed fix (required)
143184

144-
1. **One-line bold headline** stating the result kind and one identifying detail.
145-
2. **Diagnosis** (≤5 concise bullet points): what broke and where, the most likely root cause.
146-
3. **Next steps** (≤5 concise bullet points).
185+
Provide the most direct path to resolution immediately after the summary.
147186

148-
Put the full `flaky-test-investigator` skill output inside a collapsed `<details><summary>Investigation details</summary> ... </details>` block (not in the visible portion). Open the block with a `#### Findings` subsection containing exactly these four bullets in this order — downstream tooling parses them, so preserve keys, casing, and `` - `key`: value `` shape. These bullets must live **inside `<details>`**, never in the visible portion:
187+
- **Single file:** lead directly with the suggested code diff or specific action.
188+
- **Multiple files:** use a brief table to list affected files, followed by the necessary changes.
189+
- **No concrete fix:** clearly state what additional evidence or investigation is needed to propose one.
149190

150-
- `classification`: `test-design` | `test-environment` | `application` | `external` | `inconclusive`
151-
- `confidence`: `high` | `medium` | `low`
152-
- `test.type`: `scout` (if `scout-playwright` label) | `ftr` | `jest` | `unknown`
153-
- `test.file`: repo-relative path, or `unknown`
191+
### 3. Root Cause & Evidence (required)
154192

155-
The skill's "Reporting" subsections should also be inside the collapsible section:
193+
Explain _why_ the failure occurred, citing specific evidence. Choose the format that best fits the complexity of the bug:
156194

157-
- What the test does
158-
- What failed and when
159-
- Where it ran
160-
- Root cause hypothesis
161-
- Evidence
162-
- Failure screenshot
163-
- Recommended next step
164-
- Open questions
195+
- Use concise paragraphs with inline Markdown links pointing to specific code lines, commits, or files.
196+
- Use an ASCII timeline diagram for race conditions, multi-component bugs, or complex state leaks.
197+
- Fold relevant evidence (like missing `data-test-subj` attributes, failing network calls, or screenshot descriptions) directly into this narrative.
165198

166-
Blank lines around `</summary>` and `</details>` are required for the inner markdown to render.
199+
### 4. Additional context (optional)
167200

168-
End the comment with this footer line (verbatim, on its own line after the `</details>` block):
201+
Include the following only if they provide high-value, actionable signal:
169202

170-
`<sup>AI-generated, share feedback in [#appex-qa](https://elastic.slack.com/archives/C04HT4P1YS3)</sup>`
203+
- **Ruled out:** a brief note on alternative hypotheses that were investigated and dismissed.
204+
- **Verification:** specific steps to reproduce the failure or confirm the fix.
205+
- **Open questions:** unresolved design or environmental issues blocking a definitive fix ("a screenshot would have helped troubleshoot this" is a valid open question).

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -537,6 +537,7 @@
537537
"@kbn/core-user-settings-server": "link:src/core/packages/user-settings/server",
538538
"@kbn/core-user-settings-server-internal": "link:src/core/packages/user-settings/server-internal",
539539
"@kbn/core-user-settings-server-mocks": "link:src/core/packages/user-settings/server-mocks",
540+
"@kbn/core-user-settings-types": "link:src/core/packages/user-settings/types",
540541
"@kbn/core-user-storage-browser": "link:src/core/packages/user-storage/browser",
541542
"@kbn/core-user-storage-browser-internal": "link:src/core/packages/user-storage/browser-internal",
542543
"@kbn/core-user-storage-common": "link:src/core/packages/user-storage/common",

src/core/packages/chrome/app-menu/core-chrome-app-menu-components/src/components/app_menu_action_button.tsx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@ export const AppMenuActionButton = (props: AppMenuActionButtonProps) => {
177177
popoverWidth={popoverWidth}
178178
popoverTestId={popoverTestId}
179179
anchorPosition="downRight"
180+
repositionToCrossAxis={false}
180181
onClose={onPopoverClose}
181182
onCloseOverflowButton={onCloseOverflowButton}
182183
/>
@@ -195,6 +196,7 @@ export const AppMenuActionButton = (props: AppMenuActionButtonProps) => {
195196
popoverWidth={popoverWidth}
196197
popoverTestId={popoverTestId}
197198
anchorPosition="downRight"
199+
repositionToCrossAxis={false}
198200
onClose={onPopoverClose}
199201
onCloseOverflowButton={onCloseOverflowButton}
200202
/>

src/core/packages/chrome/app-menu/core-chrome-app-menu-components/src/components/app_menu_popover.tsx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ interface AppMenuContextMenuProps {
3030
switchConfig?: AppMenuSwitch;
3131
popoverTestId?: string;
3232
anchorPosition?: PopoverAnchorPosition;
33+
repositionToCrossAxis?: boolean;
3334
onClose: () => void;
3435
onCloseOverflowButton?: () => void;
3536
}
@@ -47,6 +48,7 @@ export const AppMenuPopover = ({
4748
switchConfig,
4849
popoverTestId = 'app-menu-popover',
4950
anchorPosition = 'downLeft',
51+
repositionToCrossAxis,
5052
onClose,
5153
onCloseOverflowButton,
5254
}: AppMenuContextMenuProps) => {
@@ -100,6 +102,7 @@ export const AppMenuPopover = ({
100102
hasArrow={false}
101103
anchorPosition={anchorPosition}
102104
aria-label={title || content}
105+
repositionToCrossAxis={repositionToCrossAxis}
103106
>
104107
<EuiContextMenu initialPanelId={0} panels={panels} css={{ minWidth: 180 }} />
105108
</EuiPopover>

src/core/packages/user-profile/common/src/user_profile.ts

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,30 @@ export interface UserProfileUserInfo {
6060
}
6161

6262
/**
63-
* Placeholder for data stored in user profile.
63+
* Placeholder for data stored in user profile,
64+
* services that store data in the user profile should specify said data by augmenting this type in it's implementation,
65+
* like so:
66+
*
67+
* @example
68+
* ```ts
69+
* declare module '@kbn/core-user-profile-common' {
70+
* interface UserProfileData {
71+
* myService: {
72+
* myData: string;
73+
* };
74+
* }
75+
* }
76+
* ```
77+
* This will make it such that the return value for the invocation of `getCurrent` is typed matching the defined augmentation
78+
*
79+
* ```ts
80+
* const userProfile = await userProfileService.getCurrent();
81+
* // accessing 'myService.myData' is now typed as 'string'
82+
* console.log(userProfile.data.myService.myData);
83+
* ```
6484
*/
65-
export type UserProfileData = Record<string, unknown>;
85+
// eslint-disable-next-line @typescript-eslint/no-empty-interface -- See the comment above for an explanation.
86+
export interface UserProfileData {}
6687

6788
/**
6889
* Type of the user profile labels structure (currently

src/core/packages/user-settings/server-internal/src/user_settings_service.test.ts

Lines changed: 69 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import { mockCoreContext } from '@kbn/core-base-server-mocks';
1111
import { httpServerMock } from '@kbn/core-http-server-mocks';
1212
import { userProfileServiceMock } from '@kbn/core-user-profile-server-mocks';
13-
import type { UserProfileWithSecurity } from '@kbn/core-user-profile-common';
13+
import type { UserProfileWithSecurity, UserProfileData } from '@kbn/core-user-profile-common';
1414
import { UserSettingsService } from './user_settings_service';
1515

1616
describe('#setup', () => {
@@ -26,18 +26,18 @@ describe('#setup', () => {
2626
};
2727
});
2828

29-
const createUserProfile = (darkMode: string | undefined): UserProfileWithSecurity => {
29+
const createUserProfile = (
30+
userSettings: Partial<NonNullable<UserProfileData['userSettings']>>
31+
): UserProfileWithSecurity => {
3032
return {
3133
data: {
32-
userSettings: {
33-
darkMode,
34-
},
34+
userSettings,
3535
},
3636
} as unknown as UserProfileWithSecurity;
3737
};
3838

3939
it('fetches userSettings when client is set and returns `true` when `darkMode` is set to `dark`', async () => {
40-
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile('dark'));
40+
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile({ darkMode: 'dark' }));
4141

4242
const { getUserSettingDarkMode } = service.setup();
4343
service.start(startDeps);
@@ -54,7 +54,7 @@ describe('#setup', () => {
5454
});
5555

5656
it('fetches userSettings when client is set and returns `false` when `darkMode` is set to `light`', async () => {
57-
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile('light'));
57+
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile({ darkMode: 'light' }));
5858

5959
const { getUserSettingDarkMode } = service.setup();
6060
service.start(startDeps);
@@ -71,7 +71,7 @@ describe('#setup', () => {
7171
});
7272

7373
it('fetches userSettings when client is set and returns `system` when `darkMode` is set to `system`', async () => {
74-
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile('system'));
74+
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile({ darkMode: 'system' }));
7575

7676
const { getUserSettingDarkMode } = service.setup();
7777
service.start(startDeps);
@@ -88,7 +88,7 @@ describe('#setup', () => {
8888
});
8989

9090
it('fetches userSettings when client is set and returns `undefined` when `darkMode` is set to `` (the default value)', async () => {
91-
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile(''));
91+
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile({ darkMode: undefined }));
9292

9393
const { getUserSettingDarkMode } = service.setup();
9494
service.start(startDeps);
@@ -105,7 +105,9 @@ describe('#setup', () => {
105105
});
106106

107107
it('fetches userSettings when client is set and returns `undefined` when `darkMode` is set to `space_default`', async () => {
108-
startDeps.userProfile.getCurrent.mockResolvedValue(createUserProfile('space_default'));
108+
startDeps.userProfile.getCurrent.mockResolvedValue(
109+
createUserProfile({ darkMode: 'space_default' })
110+
);
109111

110112
const { getUserSettingDarkMode } = service.setup();
111113
service.start(startDeps);
@@ -121,6 +123,63 @@ describe('#setup', () => {
121123
});
122124
});
123125

126+
it('fetches userSettings when client is set and returns `true` when `rememberSelectedSpace` is set to `true`', async () => {
127+
startDeps.userProfile.getCurrent.mockResolvedValue(
128+
createUserProfile({ rememberSelectedSpace: true })
129+
);
130+
131+
const { getUserSettingRememberSelectedSpace } = service.setup();
132+
service.start(startDeps);
133+
134+
const kibanaRequest = httpServerMock.createKibanaRequest();
135+
const rememberSelectedSpace = await getUserSettingRememberSelectedSpace(kibanaRequest);
136+
137+
expect(rememberSelectedSpace).toEqual(true);
138+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledTimes(1);
139+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledWith({
140+
request: kibanaRequest,
141+
dataPath: 'userSettings',
142+
});
143+
});
144+
145+
it('fetches userSettings when client is set and returns `false` when `rememberSelectedSpace` is set to `false`', async () => {
146+
startDeps.userProfile.getCurrent.mockResolvedValue(
147+
createUserProfile({ rememberSelectedSpace: false })
148+
);
149+
150+
const { getUserSettingRememberSelectedSpace } = service.setup();
151+
service.start(startDeps);
152+
153+
const kibanaRequest = httpServerMock.createKibanaRequest();
154+
const rememberSelectedSpace = await getUserSettingRememberSelectedSpace(kibanaRequest);
155+
156+
expect(rememberSelectedSpace).toEqual(false);
157+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledTimes(1);
158+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledWith({
159+
request: kibanaRequest,
160+
dataPath: 'userSettings',
161+
});
162+
});
163+
164+
it('fetches userSettings when client is set and returns `true` when `rememberSelectedSpace` is not set', async () => {
165+
startDeps.userProfile.getCurrent.mockResolvedValue(
166+
createUserProfile({ rememberSelectedSpace: undefined })
167+
);
168+
169+
const { getUserSettingRememberSelectedSpace } = service.setup();
170+
service.start(startDeps);
171+
172+
const kibanaRequest = httpServerMock.createKibanaRequest();
173+
const rememberSelectedSpace = await getUserSettingRememberSelectedSpace(kibanaRequest);
174+
175+
expect(rememberSelectedSpace).toEqual(true);
176+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledTimes(1);
177+
expect(startDeps.userProfile.getCurrent).toHaveBeenCalledWith({
178+
request: kibanaRequest,
179+
dataPath: 'userSettings',
180+
});
181+
});
182+
124183
it('does not fetch userSettings when client is not set, returns `undefined`, and logs a debug statement', async () => {
125184
const { getUserSettingDarkMode } = service.setup();
126185

0 commit comments

Comments
 (0)