Skip to content

CI : Split build into two jobs#287

Merged
murraystevenson merged 7 commits intoGafferHQ:mainfrom
ericmehl:ciBatches
Jan 27, 2026
Merged

CI : Split build into two jobs#287
murraystevenson merged 7 commits intoGafferHQ:mainfrom
ericmehl:ciBatches

Conversation

@ericmehl
Copy link
Contributor

@ericmehl ericmehl commented Jan 7, 2026

This is the first PR in the effort to merge the Windows dependencies (on 10_maintenance_vs2022) with main. Splitting the full build into multiple jobs is only needed for Windows because it exceeds the six hour workflow time limit.

I investigated a more user-friendly scheme for determining which projects are built in each job. One was a parameter such as batch = 1/2 where we could parse the fraction and determine the projects to include in the job. But the build times for projects vary widely so just building the first half of the projects (as determined by a simple list of project names) meant the second half could still exceed the time limit because later projects (in the dependency sequence) generally take longer than the first few.

The solution here is maybe not great since it requires manually specifying the projects for all but the final job. But as long as the last job has an empty projectsString parameter, we don't risk forgetting a project that may be added later.

@murraystevenson murraystevenson self-requested a review January 8, 2026 20:29
Copy link
Contributor

@murraystevenson murraystevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eric! This will definitely be very useful for avoiding the time limit on all platforms.

One thing I do wonder about is the behaviour of the second part of the build failing to run if any of the steps fail in the first part, as shown by our currently broken macOS CI. I do intend to fix the macOS CI once we've moved a bit further through the VFX Platform 2025 update (we'll need a Cortex release before I can do so properly), but even so I think it would be helpful when updating dependencies in the future if, for example, the Linux build was able to complete even though another build failed.

I wonder if we could just have the second part always run via if: ${{ always() }} and have the jobs that failed in the first part fail again in the second as their intermediate artifact wouldn't be available for download? This would be a bit wasteful if all jobs in the first part failed, but I'm not sure if there's a conditional that would identify that?

@murraystevenson
Copy link
Contributor

We've merged #290 which gets macOS CI up and running again, so it'd be worth rebasing this PR onto latest main to pick up those changes.

@ericmehl ericmehl force-pushed the ciBatches branch 7 times, most recently from 2e163f8 to 9d3ba52 Compare January 20, 2026 16:24
@ericmehl
Copy link
Contributor Author

I have all the platforms successfully building now.

  • 6da9200 : Removes the timestamps from build names. This ensures that the second build job uses the same build directory as the first job, which is important for some CMake projects that hard code the path for finding a module.
  • 51cb87e : Perhaps not strictly necessary for this PR. It prevents substituting build variables in publicVariables config entries. That helps keep the hash / digest consistent if you are moving build directories but wanting to reuse the existing project builds.

The main change is the point at which the job matrix occurs for building multiple platforms. Previously I was splitting the platforms in the inner, buildDependencyBatch level. This produced job one containing Linux, MacOS, Windows builds, followed by job two with the same platforms.

It meant that we were always building the same projects for each platform, which is not ideal. MacOS is pretty fast to build, Linux is fast enough to finish before the Github time limit and Windows needs the second build step.

So now I've moved the matrix to the main level, so we can specify which projects each platform builds. This was partly a workaround for not getting the second build job working on Linux, and partly to facilitate Windows builds that will have successful projects added to the list as they are ready.

b291e2d accomplishes that.

Since we're building Linux and MacOS in a single step, there wouldn't be anything to validate the two step approach unless Windows is working as well. The last few commits are my attempt at a minimal build for Windows to demonstrate the two job build process.

If we think it's a bit of scope creep to be worrying about Windows build details in this PR, I can drop those commits and probably just comment out the Windows parts of the CI build for now.

Copy link
Contributor

@murraystevenson murraystevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update Eric! I've noted a few more comments inline, but it looks like we're quite close to getting this merged.

Copy link
Contributor

@murraystevenson murraystevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eric, one last question inline. Feel free to squash down the fixups and revert commits while you're at it.

Comment on lines +35 to +38
strategy:

# Don't cancel other jobs in the build matrix if one job fails.
fail-fast: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have been moved back into main.yml with the rest of the matrix in b291e2d, or have you tested that this percolates up from the reusable workflow into the calling matrix? Either way it would probably be more clear to have fail-fast: false set back in main.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, squashed down into 6a2dbc4.

@ericmehl
Copy link
Contributor Author

Thanks @murraystevenson! I squashed down and moved the fail-fast directive to main.yml. I also cleaned up the extra lines in main.yml so the diff is a little cleaner now, less spurious removals.

Should be ready for merging!

These are needed for parsing the config files before any projects are
built.
Windows doesn't recognize Python files as executable by themselves and
needs to be told how to run the script.
Even though we're not building anything on Windows, the build script
checks for the compiler, which requires this step.
Windows will not allow removing the directory if it is still the current
directory.
@ericmehl
Copy link
Contributor Author

I rebased the latest push on the new MacOS changes, bringing the change from that branch's main.yml to buildDependenciesBatch.yml. The merge conflict is clear now.

@murraystevenson
Copy link
Contributor

Thanks Eric, merging.

@murraystevenson murraystevenson merged commit befc27e into GafferHQ:main Jan 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants