-
Notifications
You must be signed in to change notification settings - Fork 8.6k
feat(ci): mark FTR retry green when previously-failing tests recover #269605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TamerlanG
wants to merge
40
commits into
elastic:main
Choose a base branch
from
TamerlanG:ftr/smart-retry
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
5201524
chore: remove old comments that references file that doesn't exist an…
TamerlanG 29689fd
feat(ci): add ftr retry result checker to kbn-failed-test-reporter-cli
TamerlanG f89d58a
feat(ci): mark FTR retry green when previously-failing tests recover
TamerlanG 54726fb
test(ci): TEMP add retry-validation fixture — DELETE BEFORE MERGE
TamerlanG 8403a73
Merge branch 'main' into ftr/smart-retry
TamerlanG 1be5cd8
Revert "test(ci): TEMP add retry-validation fixture — DELETE BEFORE M…
TamerlanG f64246a
Merge branch 'main' into ftr/smart-retry
TamerlanG 27b3524
[CI] Add job annotation to FTR configs summary
TamerlanG 443e1cf
Reapply "test(ci): TEMP add retry-validation fixture — DELETE BEFORE …
TamerlanG 093dc39
Merge branch 'main' into ftr/smart-retry
TamerlanG bb84df5
Merge branch 'main' into ftr/smart-retry
TamerlanG 221df45
Merge branch 'main' into ftr/smart-retry
TamerlanG 0cadf4a
Merge branch 'main' into ftr/smart-retry
TamerlanG 5f661e0
Merge branch 'main' into ftr/smart-retry
TamerlanG 2522c6c
Merge branch 'main' into ftr/smart-retry
TamerlanG 11841f7
improve job annotation
TamerlanG 6931aeb
remove bail
TamerlanG b137ca2
remove view logs link from job annotation
TamerlanG ad1ae9a
show failing test names per config in job annotation
TamerlanG 7f2c559
Revert "Reapply "test(ci): TEMP add retry-validation fixture — DELETE…
TamerlanG b9aa8a4
refactor(ci): simplify ftr_configs.sh annotation and failure extraction
TamerlanG 8ab9a58
feat(ci): verify explicit passes on retry instead of absence of failure
TamerlanG 924635c
fix(ci): guard scout reporter error, log smart-retry inactivity, clar…
TamerlanG 989bd71
refactor(ci): split ftr_configs.sh into focused helper files
TamerlanG 8238bd3
refactor(ci): move XML diff dance and temp-file plumbing into the Nod…
TamerlanG 65bc3fd
Merge branch 'main' into ftr/smart-retry
TamerlanG 84f0b90
chore(ci): remove job annotation from smart-retry PR
TamerlanG c3712f9
refactor(ci): remove dead computeIntersection export and initialize r…
TamerlanG 69c8326
Reapply "test(ci): TEMP add retry-validation fixture — DELETE BEFORE …
TamerlanG c8a62ec
Revert "chore: remove old comments that references file that doesn't …
TamerlanG 2193bfb
Revert "Reapply "test(ci): TEMP add retry-validation fixture — DELETE…
TamerlanG a1c1565
bring back verbose version
TamerlanG d51172c
revert comments
TamerlanG 3cfcdfb
refactor(ci): move FAILED_TESTS_KEY and retry_recovered into ftr_smar…
TamerlanG 6483777
add whitesapce
TamerlanG fcdfba7
Reapply "Reapply "test(ci): TEMP add retry-validation fixture — DELET…
TamerlanG ac19c1a
Merge branch 'main' into ftr/smart-retry
TamerlanG 0ba5cbb
put this all behind an env flag
TamerlanG bad92a9
put bail behind a env variable too
TamerlanG be25fc6
Update .buildkite/scripts/steps/test/ftr_configs.sh
TamerlanG File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Sourced by ftr_configs.sh — do not execute directly. | ||
| # Reads/writes globals: exitCode, failedConfigs, | ||
| # FAILED_CONFIGS_KEY, JOB, BUILDKITE_RETRY_COUNT. | ||
|
|
||
| FAILED_TESTS_KEY="${BUILDKITE_STEP_ID}${FTR_CONFIG_GROUP_KEY}_failed_tests" | ||
| retry_recovered=false | ||
|
|
||
| # Called after attempt 1: stores failing test names so the retry can verify recovery. | ||
| store_failing_tests() { | ||
| [[ -n "${KIBANA_FLAKY_TEST_RUNNER_CONFIG:-}" ]] && return | ||
| [[ "${BUILDKITE_RETRY_COUNT:-0}" != "0" ]] && return | ||
| [[ "$exitCode" == "0" ]] && return | ||
|
|
||
| local junitDir="target/junit/$JOB" | ||
| [[ -d "$junitDir" ]] || return | ||
|
|
||
| local failedTestNames | ||
| failedTestNames=$(node scripts/ftr_check_retry_result list-failures "$junitDir" 2>/dev/null || true) | ||
| if [[ "$failedTestNames" ]]; then | ||
| buildkite-agent meta-data set "$FAILED_TESTS_KEY" "$failedTestNames" | ||
| echo "Stored $(echo "$failedTestNames" | wc -l | tr -d ' ') previously-failing test name(s) for retry evaluation" | ||
| fi | ||
| } | ||
|
|
||
| # Called after attempt 2: marks the step green if all previously-failing tests explicitly passed. | ||
| # On a third-or-later manual retry, logs that smart-retry is inactive. | ||
| apply_smart_retry() { | ||
| [[ -n "${KIBANA_FLAKY_TEST_RUNNER_CONFIG:-}" ]] && return | ||
| [[ "$exitCode" == "0" ]] && return | ||
|
|
||
| local retryCount="${BUILDKITE_RETRY_COUNT:-0}" | ||
|
|
||
| if [[ "$retryCount" -ge "2" ]]; then | ||
| echo "--- [smart-retry] inactive on attempt $((retryCount + 1)) — only applies to the first automatic retry" | ||
| return | ||
| fi | ||
|
|
||
| [[ "$retryCount" != "1" ]] && return | ||
|
|
||
| local prevFailedTests | ||
| prevFailedTests=$(buildkite-agent meta-data get "$FAILED_TESTS_KEY" --default '' 2>/dev/null || true) | ||
| [[ "$prevFailedTests" ]] || return | ||
|
|
||
| local junitDir="target/junit/$JOB" | ||
|
|
||
| local intersectionCode | ||
| set +e | ||
| printf '%s' "$prevFailedTests" | node scripts/ftr_check_retry_result check-intersection \ | ||
| --junit-dir "$junitDir" \ | ||
| --prev-failures-stdin | ||
| intersectionCode=$? | ||
| set -e | ||
|
|
||
| if [[ "$intersectionCode" == "0" ]]; then | ||
| echo "--- [smart-retry] All previously-failing tests recovered on retry — marking step green" | ||
| exitCode=0 | ||
| failedConfigs="" | ||
| retry_recovered=true | ||
|
TamerlanG marked this conversation as resolved.
|
||
| buildkite-agent meta-data set "$FAILED_CONFIGS_KEY" "" 2>/dev/null || true | ||
| fi | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correctness/regression: this line runs at source-time (top of
ftr_configs.shline 6) which is before the defensiveFTR_CONFIG_GROUP_KEY=${FTR_CONFIG_GROUP_KEY:-}assignment on line 9 offtr_configs.sh. Becauseftr_configs.shenablesset -euo pipefailbefore sourcing, expanding${FTR_CONFIG_GROUP_KEY}(no:-default) underset -uwill abort the entire step withFTR_CONFIG_GROUP_KEY: unbound variablein any run where that env var is not set on the agent — exactly the case the existingFTR_CONFIG_GROUP_KEY=${FTR_CONFIG_GROUP_KEY:-}line was written to guard against. The standard "FTR Configs" steps in.buildkite/pipeline-utils/ci-stats/pick_test_group_run_order/steps.ts:82set it explicitly, but other callers (e.g. parallel-job paths handled at lines 10–13 offtr_configs.sh) and local runs do not.Easiest fix: add a
:-default so source-time evaluation is safe.