-
Notifications
You must be signed in to change notification settings - Fork 596
.github/workflows: Naive unit test sharding #12780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the GitHub Actions unit test workflow to introduce test sharding for improved parallelization and performance. The workflow now splits tests into "fast" and "slow" shards that run concurrently, with an aggregation step to ensure all tests pass.
Key changes:
- Introduced matrix-based test sharding (fast/slow shards) to parallelize test execution
- Replaced
make unit-with-coveragewith directgotestsuminvocations for granular control - Added coverage artifact uploading and aggregation step to track test results from both shards
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
51e435d to
7998018
Compare
|
~50% reduction. The fast matrix job is still the longest job though. Proper implementation requires dynamic sharding based on historical runtime. Existing problem though, same thing the e2e suite sharding approach lacks. |
Signed-off-by: timflannagan <[email protected]>
7998018 to
91abc6e
Compare
| uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: go.mod | ||
| cache: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the default is already true
| - name: Run unit tests (${{ matrix.shard.name }}) | ||
| run: | | ||
| if [ "${{ matrix.shard.name }}" = "fast" ]; then | ||
| PACKAGES=$(go list ./... | grep -v -e 'internal/kgateway/translator/gateway$' -e 'internal/kgateway/controller$' -e 'internal/kgateway/agentgatewaysyncer$' -e 'internal/sds/pkg/run$' -e 'internal/kgateway/setup$' | tr '\n' ' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems a little error-prone to have to define this list in 2 places, wonder if we could store them in an env or something
| fi | ||
| - name: Validate Test Coverage | ||
| shell: bash | ||
| run: make validate-test-coverage || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is || true needed here?
Description
Attempt to shard the unit test workflow. Right now we're hovering around ~10-12m for the unit test workflow which is very heavy for this type of suite. Implement some naive sharding that mirrors how the e2e suite is currently sharded. Ideally, dynamic sharding based on historical runtime is the medium/long term solution here.
Change Type
/kind cleanup
Changelog
Additional Notes