Skip to content

Commit eb74355

Browse files
authored
[Flaky Test Fixer] Use flaky-test-investigator skill in Failed Test Investigator workflow (elastic#271443)
This PR updates the existing Failed Test Investigator workflow to use the [newly introduced](elastic#269071) `flaky-test-investigator` skill. The skill is being picked up correctly: see sample output [here](elastic/sdh-kibana-automated-issue-triage#18 (comment)). ### Local testing You can test the workflow by invoking it locally: ``` gh aw trial ./.github/workflows/failed-test-investigator.md \ --logical-repo elastic/kibana \ --trigger-context elastic#270988 ``` The command above will create a private repository and invoke the workflow for you. You'll first need to add a `LITELLM_API_KEY` secret to the private/test repository that `gh aw trial` creates for you for the workflow to work (generate it via LiteLLM, for example).
1 parent c3fa2fb commit eb74355

1 file changed

Lines changed: 22 additions & 54 deletions

File tree

.github/workflows/failed-test-investigator.md

Lines changed: 22 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -87,20 +87,9 @@ Investigate a failed-test issue, classify the failure, and propose a fix when ap
8787
- **`issues` trigger**: use the triggering issue (non-PR, labeled `failed-test`).
8888
- **`workflow_dispatch`**: use issue `${{ github.event.inputs.issue_number }}`. Fetch it explicitly before analysis, and post the final comment there.
8989

90-
## Where did the test run?
91-
92-
The test's **target** (e.g. `local-stateful-classic`, `cloud-serverless-security_complete`) tells you where it ran:
93-
94-
- **`cloud-*`** — ran against a real Elastic Cloud project (serverless) or deployment (stateful). Pipeline names: `appex-qa-{serverless|stateful}-kibana-{ftr|scout}-tests`.
95-
- **`local-*`** — ran on the agent's local machine. `kibana-on-merge` and `kibana-pull-request` are local (no Elastic Cloud API calls), so the environment is more stable and less prone to network/env flakiness.
96-
9790
## Investigate
9891

99-
1. Read the issue title, body, labels, and all comments.
100-
2. Parse test metadata if present: location (test file path), config path, code owners, target.
101-
3. Look at all the failures reported in the issue. The very same test could have been failing with different error messages, for different reasons, on different pipelines, and on different branches.
102-
4. Inspect the relevant test file and nearby helpers/fixtures. For Scout, start from the reported location; otherwise infer from the title.
103-
5. Check recent git history and blame on the test file and related product code.
92+
Investigate the test failure(s) using the `flaky-test-investigator` skill.
10493

10594
Every conclusion must cite specific evidence. Do not guess.
10695

@@ -150,53 +139,32 @@ No other side-effects beyond posting the comment and updating the label.
150139

151140
## Comment format
152141

153-
Post exactly one comment with two main parts:
154-
155-
- **Visible section**: a very concise summary that would inform a developer with a quick glance. Highlight main findings. Keep it high-signal and to the point.
156-
- **Collapsed `<details>` section**: full long-form context for the downstream auto-fix agent (and any human who wants to audit the call).
157-
158-
The visible section is a _distillation_ of the collapsed one. Do not repeat content verbatim across both: the visible bullets summarize, the collapsed block holds the full evidence the summary was derived from.
159-
160-
### Visible (top), in this order:
161-
162-
1. **One-line bold headline** stating the result kind and one identifying detail. Consistent with `classification` but not templated. Example: `**Likely test-design fix** — missing waitForAlertsToPopulate() in building_block_alerts.spec.ts`.
163-
164-
2. **A 3–5 sentence prose paragraph** (no headings, no bullets) covering: what broke and where (name the test file/name), the most likely root cause, and any evidence-backed author attribution with `@username` so they get notified on first read.
165-
166-
3. **One-line action hint**: the proposed fix, recommended action, or missing evidence. Skip if the paragraph already covers it.
167-
168-
4. **Findings bullets** — exactly these four, in this order, with one concrete value each. Downstream tooling parses these directly; preserve keys, casing, and `` - `key`: value `` shape:
169-
170-
- `classification`: `test-design` | `test-environment` | `application` | `external` | `inconclusive`
171-
- `confidence`: `high` | `medium` | `low`
172-
- `test.type`: `scout` (if `scout-playwright` label) | `ftr` | `jest` | `unknown`
173-
- `test.file`: repo-relative path, or `unknown`
174-
175-
5. **Suspected root cause** — 2–4 short bullets, each tied to a specific piece of evidence. Skip the section entirely when `classification` is `external` or `inconclusive` and there is nothing concrete to assert.
176-
177-
6. **Key references** — at most 3 Markdown links: the failing test file, the failing CI run, and the implicated commit (when one exists). Skip any of the three that are not applicable; skip the section entirely when none apply.
178-
179-
### Collapsed (`<details>`):
180-
181-
This section is the full context for agents and humans to dive deep into the findings. Verify all information. Wrap it in a single `<details>` block. The blank lines around `</summary>` and `</details>` are required for the inner markdown to render.
182-
183-
```
184-
<details>
185-
<summary>See full details</summary>
142+
Post exactly one comment. Keep the visible portion very short and easy to read:
186143

187-
#### Full root-cause analysis
144+
1. **One-line bold headline** stating the result kind and one identifying detail.
145+
2. **Diagnosis** (≤5 concise bullet points): what broke and where, the most likely root cause.
146+
3. **Next steps** (≤5 concise bullet points).
188147

189-
The long-form version of the visible "Suspected root cause" bullets. Walk through the evidence chain step by step. Cite the specific log lines, stack frames, blame results, or related PRs that led to the conclusion.
148+
Put the full `flaky-test-investigator` skill output inside a collapsed `<details><summary>Investigation details</summary> ... </details>` block (not in the visible portion). Open the block with a `#### Findings` subsection containing exactly these four bullets in this order — downstream tooling parses them, so preserve keys, casing, and `` - `key`: value `` shape. These bullets must live **inside `<details>`**, never in the visible portion:
190149

191-
#### Evidence used
150+
- `classification`: `test-design` | `test-environment` | `application` | `external` | `inconclusive`
151+
- `confidence`: `high` | `medium` | `low`
152+
- `test.type`: `scout` (if `scout-playwright` label) | `ftr` | `jest` | `unknown`
153+
- `test.file`: repo-relative path, or `unknown`
192154

193-
A complete list of the evidence consulted: issue comments, file paths, commits, CI runs, blame output, related PRs. Each item should be a Markdown link, not a bare path or SHA.
155+
The skill's "Reporting" subsections should also be inside the collapsible section:
194156

195-
#### Suggested patch
157+
- What the test does
158+
- What failed and when
159+
- Where it ran
160+
- Root cause hypothesis
161+
- Evidence
162+
- Failure screenshot
163+
- Recommended next step
164+
- Open questions
196165

197-
Only when justified by the evidence: a small diff-style snippet showing the suggested edit. Include the exact file, function, assertion, wait condition, fixture, selector, API, or behavior to change. Omit this section entirely when no defensible patch can be proposed.
166+
Blank lines around `</summary>` and `</details>` are required for the inner markdown to render.
198167

199-
</details>
200-
```
168+
End the comment with this footer line (verbatim, on its own line after the `</details>` block):
201169

202-
Use `####` headings inside the details block (not `###`) so they nest below the comment's own structure. Any of the three subsections may be omitted when there is nothing meaningful to put in it.
170+
`<sup>AI-generated, share feedback in [#appex-qa](https://elastic.slack.com/archives/C04HT4P1YS3)</sup>`

0 commit comments

Comments
 (0)