feat(ci): mark FTR retry green when previously-failing tests recover by TamerlanG · Pull Request #269605 · elastic/kibana

TamerlanG · 2026-05-16T13:15:15Z

Why

Flaky tests are a recurring source of CI noise. When a test fails once but passes on the automatic retry, the whole thing is clearly a flake — yet today the retry still counts any failure (even a different test's flakiness) and keeps the step red. This wastes engineer time investigating failures that aren't real.

What this does

On the first automatic retry of a failing FTR step, instead of failing on any test failure, we now ask: did every test that failed in attempt 1 explicitly pass in attempt 2?

If yes → the original failures recovered → CI goes green, even if a different unrelated test happened to be flaky on retry.
If no → at least one original failure did not pass → CI stays red.

How it works

At the end of attempt 1, the failing test names are read from the JUnit XML and stored in Buildkite metadata.
At the end of attempt 2, each previously-failing test is checked for an explicit pass in the retry JUnit output — not just absence of failure. This guards against three false-green scenarios:
- Runner crash — the JUnit directory is empty so no failures appear, but nothing passed either
- beforeAll hook failure — tests are reported as <skipped/> rather than failed, so they look "recovered" when they weren't
- Stale XML files — on persistent-workspace agents, attempt-1 XML files can persist alongside attempt-2 files

If all previously-failing tests explicitly passed, the step exit code is overridden to 0.

A CLI (scripts/ftr_check_retry_result.js) in @kbn/failed-test-reporter-cli handles the XML parsing and verification logic.

Bail behaviour

--bail has been removed from all FTR runs. Without it, every config runs to completion on both attempt 1 and the retry, so the smart-retry check always has a full picture of what's still failing rather than potentially missing the original test because a different failure caused an early exit.

…ymore

Adds `retry_result_checker.ts` to `@kbn/failed-test-reporter-cli` with two CLI commands exposed via `scripts/ftr_check_retry_result.js`: - `list-failures <junit-dir>` — prints the name of every failed test found in the JUnit XML files under a directory, one per line. - `check-intersection --junit-dir <dir> --prev-failures-file <file>` — compares the current attempt's failures against a saved list from a previous attempt; exits 0 if the intersection is empty (all previously-failing tests recovered), exits 1 otherwise. Also refactors `failed_tests_reporter_cli.ts` to export `runFailedTestsReporterCli()` rather than executing on import, so the package can export both CLIs from a single entry point without one triggering when the other is called.

On the first automatic retry of a failing FTR step, the step is now marked green if every test that failed in attempt 1 passes in attempt 2 — even if a different (previously-passing) test happens to fail on retry, which would indicate a separate flake unrelated to the original failure. How it works: - End of attempt 1: JUnit XML is parsed and the failing test names are stored in Buildkite metadata. - End of attempt 2: the stored names are retrieved and intersected with the current attempt's failures. An empty intersection overrides the exit code to 0. Skipped for flaky-test-runner runs (KIBANA_FLAKY_TEST_RUNNER_CONFIG). Known limitation: if --bail causes attempt 2 to stop on a different test before reaching the originally-failing test, the intersection will appear empty and the step will be marked green even though the original failing test was never verified.

Plants a deliberately-flaky test pair inside the unused_urls_task FTR config to validate the retry intersection logic introduced in this PR. Setup (relies on FTR's --bail and BUILDKITE_RETRY_COUNT): - Attempt 1: TEST_A fails, --bail stops the run. JUnit records TEST_A. - Attempt 2: TEST_A passes (recovered). TEST_B now fails, --bail stops. JUnit records TEST_B. Stored prev failures: {TEST_A}. Current failures: {TEST_B}. Intersection is empty → ftr_configs.sh overrides exit code to 0 and the step turns green. Expected outcome in CI: red attempt 1, red attempt 2 internally, but the step ends green because no previously-failing test failed again. DELETE before merging.

TamerlanG · 2026-05-16T16:31:36Z

Seems to work according to this build.

I added a mock FTR suite in 54726fb

TamerlanG · 2026-05-16T16:32:35Z

I think one other improvement I could do is utilize job annotations to show this information better. I'll play around with that idea next, maybe in some other PR.

infra-vault-gh-plugin-prod · 2026-05-16T16:32:42Z

Pinging @elastic/kibana-operations (Team:Operations)

…ERGE" This reverts commit 54726fb.

Surface per-config status, failures, and retry recovery on the job detail page via a job-scoped Buildkite annotation, so that retry/pass /fail outcomes aren't buried in the step logs.

…MERGE" This reverts commit 1be5cd8.

github-actions

Nice approach overall — the intersection logic is small and well-tested, and routing the new CLI through @kbn/failed-test-reporter-cli keeps the JUnit parsing in one place. Two concrete concerns left inline:

The retry_validation_delete_before_merge.ts fixture and its loadTestFile wiring are still in the diff. The PR is no longer in draft, so this needs to come out (or be put behind an env flag) before merge.
An empty failure-intersection on attempt 2 can mean recovered or never ran (the --bail open question, plus the FTR-crashed/missing-junit-dir case). Suggest tightening the green check to require that previously-failing tests are actually observed in attempt 2's JUnit (passed or failed) before promoting to green — that resolves the open question without dropping --bail.

Low-priority nits:

node scripts/ftr_check_retry_result list-failures "$junitDir" 2>/dev/null || true silently swallows any error from the script. Acceptable as a fail-open, but worth at least logging to stderr (without failing the step) so we can spot regressions in the helper itself.
Consider asserting in ftr_configs.sh that target/junit/$JOB actually exists / has at least one XML before declaring recovery, as a complementary safeguard to the executed-set check above.

TamerlanG · 2026-05-18T09:20:21Z

Update: We're using buildkite agent version 3.109.1 which doesn't have the job annotation feature. So created a PR to update that. Depending on how that PR plays out I'll see if I can add the job annotations or not. (if not then it can be on the build level but I'd prefer to not have that)

TamerlanG · 2026-05-21T20:35:06Z

/ci

TamerlanG · 2026-05-26T13:14:31Z

Confirmed that the job annotations work, I had to bump buildkite agent for kibana to the latest version.

… BEFORE MERGE"" This reverts commit 443e1cf.

Extract inline XML diff logic into collect_config_failures(), remove the redundant failed-configs summary from write_job_annotation() (the table rows already show still-failing/new-failure/recovered per config), and simplify the failedConfigs concatenation.

Replace the intersection-of-failures check with collectPassedTestNames, which requires each previously-failing test to appear as an explicit pass in the retry JUnit output. This closes three false-green gaps: runner crash (empty JUnit dir), beforeAll hook failure (tests reported as skipped rather than failed), and stale XML files from attempt 1 persisting on persistent-workspace agents.

…ify recovered message - Wrap scout upload in set+e so a non-zero exit code does not abort the config loop; log the exit code and continue rather than silently swallowing it - Log a clear message when smart-retry is inactive on a third-or-later manual retry - Update the "recovered on retry" annotation to note that new failures on retry are not counted against recovery

Extract smart-retry logic into ftr_smart_retry.sh (store_failing_tests / apply_smart_retry) and annotation helpers into ftr_job_annotation.sh (collect_config_failures / write_job_annotation). The main script is now a thin orchestrator that sources both and reads as a linear narrative.

…e CLI Add snapshot/list-new-failures subcommands to ftr_check_retry_result so the per-config JUnit attribution diff lives in TypeScript rather than bash (find/sort/comm/cp plumbing). Add --prev-failures-stdin to check-intersection so the temp-file handshake in apply_smart_retry becomes a plain pipe. The bash loop now reads as two CLI calls per config instead of managing temp files and directory diffs inline.

Moving annotation logic (write_job_annotation, per-config status table, per-config failing test names, snapshot/list-new-failures CLI subcommands) to a separate PR so this branch focuses purely on the retry mechanism.

github-actions

Prior-review concerns from this PR have been addressed in the current diff: the temporary FTR fixture is gone, the silent-green hole is closed by requiring explicit passes via collectPassedTestNames, and the stale attempt-1 XML scenario is handled (and covered by tests in retry_result_checker.test.ts). Two small cleanup notes inline — dead retry_recovered shell variable + stale header comment, and the now-unused computeIntersection helper. Non-blocking.

Generated by Claude Reviewer for issue #269605 · ● 9.7M

…etry_recovered - Drop computeIntersection function and its tests — unused in production code - Initialize retry_recovered=false in ftr_configs.sh so the variable is bound under set -u before ftr_smart_retry.sh sets it on recovery - Revert Scout reporter error handling to original form

…MERGE" This reverts commit 1be5cd8.

…exist anymore" This reverts commit 5201524.

… BEFORE MERGE"" This reverts commit 69c8326.

…t_retry.sh Both variables are only used within ftr_smart_retry.sh, so they belong there.

github-actions

One concrete regression risk from the recent move of FAILED_TESTS_KEY into ftr_smart_retry.sh at module scope — inline. Other prior-review concerns (temp fixture, silent-green hole, stale XMLs, computeIntersection) look resolved in the current diff; the previously-flagged dead retry_recovered variable was kept intentionally per commit c3712f999 so I won't re-flag.

Generated by Claude Reviewer for issue #269605 · ● 9.3M

github-actions · 2026-05-28T09:31:21Z

+# Reads/writes globals: exitCode, failedConfigs,
+# FAILED_CONFIGS_KEY, JOB, BUILDKITE_RETRY_COUNT.
+
+FAILED_TESTS_KEY="${BUILDKITE_STEP_ID}${FTR_CONFIG_GROUP_KEY}_failed_tests"


Correctness/regression: this line runs at source-time (top of ftr_configs.sh line 6) which is before the defensive FTR_CONFIG_GROUP_KEY=${FTR_CONFIG_GROUP_KEY:-} assignment on line 9 of ftr_configs.sh. Because ftr_configs.sh enables set -euo pipefail before sourcing, expanding ${FTR_CONFIG_GROUP_KEY} (no :- default) under set -u will abort the entire step with FTR_CONFIG_GROUP_KEY: unbound variable in any run where that env var is not set on the agent — exactly the case the existing FTR_CONFIG_GROUP_KEY=${FTR_CONFIG_GROUP_KEY:-} line was written to guard against. The standard "FTR Configs" steps in .buildkite/pipeline-utils/ci-stats/pick_test_group_run_order/steps.ts:82 set it explicitly, but other callers (e.g. parallel-job paths handled at lines 10–13 of ftr_configs.sh) and local runs do not.

Easiest fix: add a :- default so source-time evaluation is safe.

Suggested change

FAILED_TESTS_KEY="${BUILDKITE_STEP_ID}${FTR_CONFIG_GROUP_KEY}_failed_tests"

FAILED_TESTS_KEY="${BUILDKITE_STEP_ID:-}${FTR_CONFIG_GROUP_KEY:-}_failed_tests"

…E BEFORE MERGE"" This reverts commit 2193bfb.

tylersmalley · 2026-05-29T13:35:42Z

Apologies, I think I am going back on this. But let's put this behind an ENV flag FTR_SMART_RETRY_ENABLED. That way we can test specific commits, as well as disable the functionality if we find an issue without reverting the entire change.

One fear is about what happens if we experience an issue with ES that causes every test to timeout. I think we need to figure out some guardrails for this and possibly even add an exit code to avoid a retry at all.

github-actions

One concrete concern on the new BAIL_ARG gating in ftr_configs.sh — left inline.

Generated by Claude Reviewer for issue #269605 · ● 12.6M

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kibanamachine · 2026-05-31T19:00:22Z

💔 Build Failed

Buildkite Build
Commit: be25fc6
Build duration: 48 mins

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #109 / unused_urls_task retry-validation TEST_A: intentionally fails on attempt 1, passes on attempt 2
[job] [logs] FTR Configs #109 / unused_urls_task retry-validation TEST_B: passes on attempt 1, intentionally fails on attempt 2

Metrics [docs]

✅ unchanged

History

💔 Build #450658 failed bad92a9
💛 Build #450371 was flaky ac19c1a
💛 Build #449467 was flaky fcdfba7
💛 Build #449073 was flaky 69c8326
💛 Build #448989 was flaky c3712f9

TamerlanG · 2026-06-01T07:21:32Z

latest build failed because I have the thing turned off by default. I got a test that deliberately fails on second retry.

TamerlanG added 3 commits May 16, 2026 15:06

chore: remove old comments that references file that doesn't exist an…

5201524

…ymore

TamerlanG added Team:Operations Kibana-Operations Team release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting labels May 16, 2026

TamerlanG marked this pull request as ready for review May 16, 2026 16:32

TamerlanG requested review from a team as code owners May 16, 2026 16:32

TamerlanG and others added 2 commits May 16, 2026 18:32

Merge branch 'main' into ftr/smart-retry

8403a73

Revert "test(ci): TEMP add retry-validation fixture — DELETE BEFORE M…

1be5cd8

…ERGE" This reverts commit 54726fb.

TamerlanG removed the request for review from a team May 16, 2026 16:37

TamerlanG and others added 3 commits May 18, 2026 09:51

Merge branch 'main' into ftr/smart-retry

f64246a

[CI] Add job annotation to FTR configs summary

27b3524

Surface per-config status, failures, and retry recovery on the job detail page via a job-scoped Buildkite annotation, so that retry/pass /fail outcomes aren't buried in the step logs.

Reapply "test(ci): TEMP add retry-validation fixture — DELETE BEFORE …

443e1cf

…MERGE" This reverts commit 1be5cd8.

TamerlanG added the reviewer:claude PR review and comments with Claude label May 18, 2026

github-actions Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/platform/test/api_integration/apis/unused_urls_task/index.ts

Comment thread packages/kbn-failed-test-reporter-cli/failed_tests_reporter/retry_result_checker.ts

Merge branch 'main' into ftr/smart-retry

093dc39

TamerlanG added 4 commits May 21, 2026 22:35

Merge branch 'main' into ftr/smart-retry

bb84df5

Merge branch 'main' into ftr/smart-retry

221df45

Merge branch 'main' into ftr/smart-retry

0cadf4a

Merge branch 'main' into ftr/smart-retry

5f661e0

TamerlanG and others added 10 commits May 26, 2026 16:29

remove view logs link from job annotation

b137ca2

show failing test names per config in job annotation

ad1ae9a

Revert "Reapply "test(ci): TEMP add retry-validation fixture — DELETE…

7f2c559

… BEFORE MERGE"" This reverts commit 443e1cf.

Merge branch 'main' into ftr/smart-retry

65bc3fd

chore(ci): remove job annotation from smart-retry PR

84f0b90

Moving annotation logic (write_job_annotation, per-config status table, per-config failing test names, snapshot/list-new-failures CLI subcommands) to a separate PR so this branch focuses purely on the retry mechanism.

TamerlanG mentioned this pull request May 27, 2026

feat(ci): per-config failing-test names in FTR job annotation #271518

Draft

2 tasks

github-actions Bot reviewed May 27, 2026

View reviewed changes

Comment thread .buildkite/scripts/steps/test/ftr_smart_retry.sh

Comment thread packages/kbn-failed-test-reporter-cli/failed_tests_reporter/retry_result_checker.ts Outdated

TamerlanG added 8 commits May 27, 2026 16:46

Reapply "test(ci): TEMP add retry-validation fixture — DELETE BEFORE …

69c8326

…MERGE" This reverts commit 1be5cd8.

Revert "chore: remove old comments that references file that doesn't …

c8a62ec

…exist anymore" This reverts commit 5201524.

Revert "Reapply "test(ci): TEMP add retry-validation fixture — DELETE…

2193bfb

… BEFORE MERGE"" This reverts commit 69c8326.

bring back verbose version

a1c1565

revert comments

d51172c

refactor(ci): move FAILED_TESTS_KEY and retry_recovered into ftr_smar…

3cfcdfb

…t_retry.sh Both variables are only used within ftr_smart_retry.sh, so they belong there.

add whitesapce

6483777

github-actions Bot reviewed May 28, 2026

View reviewed changes

Reapply "Reapply "test(ci): TEMP add retry-validation fixture — DELET…

fcdfba7

…E BEFORE MERGE"" This reverts commit 2193bfb.

TamerlanG and others added 3 commits May 29, 2026 16:37

Merge branch 'main' into ftr/smart-retry

ac19c1a

put this all behind an env flag

0ba5cbb

put bail behind a env variable too

bad92a9

github-actions Bot reviewed May 30, 2026

View reviewed changes

Comment thread .buildkite/scripts/steps/test/ftr_configs.sh Outdated

Update .buildkite/scripts/steps/test/ftr_configs.sh

be25fc6

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

	FAILED_TESTS_KEY="${BUILDKITE_STEP_ID}${FTR_CONFIG_GROUP_KEY}_failed_tests"
	FAILED_TESTS_KEY="${BUILDKITE_STEP_ID:-}${FTR_CONFIG_GROUP_KEY:-}_failed_tests"

Conversation

TamerlanG commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What this does

How it works

Bail behaviour

Uh oh!

TamerlanG commented May 16, 2026

Uh oh!

TamerlanG commented May 16, 2026

Uh oh!

infra-vault-gh-plugin-prod Bot commented May 16, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TamerlanG commented May 18, 2026

Uh oh!

TamerlanG commented May 21, 2026

Uh oh!

TamerlanG commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

tylersmalley commented May 29, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kibanamachine commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

History

Uh oh!

TamerlanG commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TamerlanG commented May 16, 2026 •

edited

Loading

kibanamachine commented May 31, 2026 •

edited

Loading