Skip to content

Conversation

@timflannagan
Copy link
Member

@timflannagan timflannagan commented Oct 30, 2025

Description

Attempt to shard the unit test workflow. Right now we're hovering around ~10-12m for the unit test workflow which is very heavy for this type of suite. Implement some naive sharding that mirrors how the e2e suite is currently sharded. Ideally, dynamic sharding based on historical runtime is the medium/long term solution here.

Change Type

/kind cleanup

Changelog

NONE

Additional Notes

Copilot AI review requested due to automatic review settings October 30, 2025 04:04
@github-actions github-actions bot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none labels Oct 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the GitHub Actions unit test workflow to introduce test sharding for improved parallelization and performance. The workflow now splits tests into "fast" and "slow" shards that run concurrently, with an aggregation step to ensure all tests pass.

Key changes:

  • Introduced matrix-based test sharding (fast/slow shards) to parallelize test execution
  • Replaced make unit-with-coverage with direct gotestsum invocations for granular control
  • Added coverage artifact uploading and aggregation step to track test results from both shards

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@timflannagan timflannagan force-pushed the chore/fast-unit branch 4 times, most recently from 51e435d to 7998018 Compare October 30, 2025 04:32
@timflannagan
Copy link
Member Author

~50% reduction. The fast matrix job is still the longest job though. Proper implementation requires dynamic sharding based on historical runtime. Existing problem though, same thing the e2e suite sharding approach lacks.

uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the default is already true

- name: Run unit tests (${{ matrix.shard.name }})
run: |
if [ "${{ matrix.shard.name }}" = "fast" ]; then
PACKAGES=$(go list ./... | grep -v -e 'internal/kgateway/translator/gateway$' -e 'internal/kgateway/controller$' -e 'internal/kgateway/agentgatewaysyncer$' -e 'internal/sds/pkg/run$' -e 'internal/kgateway/setup$' | tr '\n' ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems a little error-prone to have to define this list in 2 places, wonder if we could store them in an env or something

fi
- name: Validate Test Coverage
shell: bash
run: make validate-test-coverage || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is || true needed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants