Skip to content

Add retry mechanism for failing integration tests#1879

Merged
sondavidb merged 2 commits into
awslabs:mainfrom
sondavidb:retry-failing-integration-tests
Mar 3, 2026
Merged

Add retry mechanism for failing integration tests#1879
sondavidb merged 2 commits into
awslabs:mainfrom
sondavidb:retry-failing-integration-tests

Conversation

@sondavidb
Copy link
Copy Markdown
Contributor

Issue #, if available:

Description of changes:
Our integration tests have become flakier for networking reasons. I decided to add a wrapper script to catch failing tests and retry only the ones that have failed.

This isn't perfect as the -run flag in go test takes regex, so if the failing test name happens to be the prefix for another test, it will run more tests than necessary (e.g. -run TestConvert would also run TestConvertWithForceRecreateZtocs and TestConvertAndPush). However, this is still better than what we have currently, so I guess it's fine for now.

Testing performed:
Ran ./scripts/integration-with-retries 3 locally and ensured it passed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sondavidb sondavidb requested a review from a team as a code owner February 28, 2026 03:24
@github-actions github-actions Bot added the github_actions Pull requests that update GitHub Actions code label Feb 28, 2026
@sondavidb sondavidb force-pushed the retry-failing-integration-tests branch 3 times, most recently from 5a24a34 to 9aed58d Compare March 1, 2026 05:06
Before this change we would have to empty the covdir before re-running
any integration tests or unit tests with coverage, which made rerunning
them in CI difficult.

Signed-off-by: David Son <davbson@amazon.com>
@sondavidb sondavidb force-pushed the retry-failing-integration-tests branch from 9aed58d to 600de95 Compare March 1, 2026 05:56
@sondavidb
Copy link
Copy Markdown
Contributor Author

Comment thread scripts/integration-with-retries.sh Outdated
@Swapnanil-Gupta
Copy link
Copy Markdown
Contributor

e.g. -run TestConvert would also run TestConvertWithForceRecreateZtocs and TestConvertAndPush

Can we use -run '^TestConvert$' to match only TestConvert?

@sondavidb sondavidb force-pushed the retry-failing-integration-tests branch from 600de95 to 102045f Compare March 2, 2026 20:03
@Shubhranshu153
Copy link
Copy Markdown
Contributor

Test seem to fail.

Signed-off-by: David Son <davbson@amazon.com>
@sondavidb sondavidb force-pushed the retry-failing-integration-tests branch from 102045f to 551542c Compare March 2, 2026 21:31
@sondavidb
Copy link
Copy Markdown
Contributor Author

Test seem to fail.

Wrong version 🤦 correct one should run them now

@sondavidb
Copy link
Copy Markdown
Contributor Author

Nice, we got a failure that got caught: https://github.com/awslabs/soci-snapshotter/actions/runs/22596642133/job/65468271427?pr=1879

Wish we could do something about these network errors though. Would be nice only to fail on those.

@sondavidb sondavidb merged commit c76cb8b into awslabs:main Mar 3, 2026
40 checks passed
@sondavidb sondavidb deleted the retry-failing-integration-tests branch March 3, 2026 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants