Skip to content

Conversation

eljobe
Copy link
Member

@eljobe eljobe commented Sep 18, 2025

This PR changes the structure of our .github directory pretty significantly and makes it possible to use a merge queue with the master branch.

After these changes are merged, we'll enable and require the merge queue for all future PRs merged to master.

The main thing to observe is that the ci.yml and merge.yml largely share the same structure and import reusable workflow files to get their work done. The other main difference between the two is what's needed for the can_proceed job to succeed. This is why some checks are required only for the final merge, but all of the checks are still run for each PR.

This PR also introduces a yamllint config and cleans up all of the workflow files to pass its formatting rules.

There are a few other cleanups that might best by understood by reading the commit bodies in the PR.

Fixes: NIT-3067

eljobe and others added 22 commits September 11, 2025 11:39
These tests only run when the files in certain directories have changed.
This is mostly for trying out the workflows along with realistic settings in
GitHub for a `main` branch.
These can be called from higher-level workflows to reuse the code for pull
requset, merge_queue and nightly scheduled checks.
The standard GitHub action runners are hitting up against the free limit too
often.
Otherwise, the contracts haven't been built yet and the linter can't find all
the generated sources it needs to build the go code.
This makes it possible to avoid rebuilding things that are already built.
Making the long-running workflows wait on the fast-running build and lint job
only slows them both down and reduces parallelism.
The previous version had a bug in how the after-next-release was being negated.
The merge queue events actually happen on the merge branches which are never
going to be named `master` or `main`.
The previous workflow files wouldn't really work for merge_group triggered
workflow runs because the event type was different.
The PR has labels, but the branch that the merge_queue runs on doesn't have
labels, and the event doesn't have a pull_request concept.

So, this change just makes the action succeed uncondiontally.
This Pull Request is just to test that the workflows on the main branch are
still healthy.

I will probably merge it and some other harmless changes to the main branch, and
then open a PR to revert them all.
For now, we're not going to try the switch from `master` to main as part of
adding the merge queue. It just makes the change too big. I'll get these changes
merged into the workflows on `master` and then I can enable the merge queue.
@eljobe
Copy link
Member Author

eljobe commented Sep 18, 2025

Just as a few notes to reviewers who are already noticing some differences:

I renamed the CI / go-tests / Full Go tests matrix tests. This is why the Required check of "Go Tests (default)" is not being satisfied even though that test has run on this PR.

Also, Run Arbitrator tests is just skipped if certain files aren't changed in the repository. This is something we can move to Repository Rules in a future change. But, for now, that Required check is simply not going to be satisfied for this PR.

Before actually merging this PR, I will disable those two missing checks in the Branch Protection rules, and then immediately after merging this PR, I will enable the Use merge queue on default Repository Ruleset which includes the correct Required checks for the new workflows.

Finally, there are several more changes that I'd like to make incrementally to help with maintainability of the workflows and reduction of usage of our CI resources:

  • Implement Piotr's suggestion for canceling existing workflow runs on the same PR branch when a new commit is pushed to the branch. NIT-3843
  • Move the currently unshared actions from nightly-ci.yml into the _go_tests.yml and parameterize the array of strings for the "matrix" run so that each of ci.yml, merge.yml and nightly-ci.yml can select the matrix targets that make sense in each of those contexts. NIT-3873
  • Have the go (and ideally Rust) test suite runs also output junit.xml compatible test reports which can be uploaded to our CodeCov account to enable us to analyze test flakiness and reduce the length of some of our longest-running tests. NIT-3874
  • Try to extract the shortest-running set of tests which still give us good coverage of the code, and require only those tests for can_proceed in ci.yml while still running larger tests in the merge.yml workflow. NIT-3875

eljobe and others added 4 commits September 20, 2025 08:42
Local actions cannot be used before the repository which holds them is checked
out.
There are two problems with "just" using the make lint command from CI.

  1. We need to take care of installing the golangci-lint binary.
	2. The GitHub action is much more efficiently integrated with the go caches.

We may want to split out the non-go linters into a separate make action at some
point soon, but it's not important right now.
@eljobe eljobe assigned pmikolajczyk41 and unassigned amsanghi Sep 20, 2025
eljobe and others added 5 commits September 21, 2025 12:51
This allows us to confirm that a merge queue set up on the main branch would
actually work as expected.
This way, if the Arbitrator or Bold Legacy jobs are skipped (because the files
didn't change,) the can_proceed job will still succeed.
The reason is just to trigger the merge queue on the main branch merge.
Copy link

codecov bot commented Sep 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 16.76%. Comparing base (f7b2fb8) to head (35df5f0).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3655      +/-   ##
==========================================
+ Coverage   16.71%   16.76%   +0.04%     
==========================================
  Files         377      377              
  Lines       57724    57724              
==========================================
+ Hits         9651     9675      +24     
+ Misses      46488    46468      -20     
+ Partials     1585     1581       -4     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pmikolajczyk41
pmikolajczyk41 previously approved these changes Sep 22, 2025
Copy link
Member

@pmikolajczyk41 pmikolajczyk41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, this all looks great. The PR brings in some huge improvements, both semantically and in terms of refactoring.

Everything seems to be in order, but the only way to truly review CI changes is to test them on a variety of different PRs. While I suspect there may be some edge cases that don't yet behave as expected, I don't see any immediate reason to hold this back. I'd love to see it merged and tested in practice as soon as we can.

@eljobe eljobe enabled auto-merge (squash) September 22, 2025 10:01
pmikolajczyk41
pmikolajczyk41 previously approved these changes Sep 22, 2025
Copy link
Member

@joshuacolvin0 joshuacolvin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eljobe eljobe disabled auto-merge September 22, 2025 17:36
@eljobe eljobe merged commit a3ae864 into master Sep 22, 2025
33 of 34 checks passed
@eljobe eljobe deleted the main branch September 22, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants