-
Notifications
You must be signed in to change notification settings - Fork 23
Description
#1605 adds a mechanism to use cached .lake contents when building a Docker image, in an attempt to speed up the build.
This works as intended when building the image on my machine, but it seems fundamentally broken to do on GitHub's CI runners, given their ephemeral nature. See #1605 for details on the workarounds I tried to get caching to work, to no avail. Thus, if we want to make our builds actually benefit from proper caching of incremental builds, we should maintain our own CI runner (as we do for the evaluation) and using that to build the Docker images -- if we do so, the cache will be kept locally, and everything will be fine.
Previously, (with aforementioned evaluation CI), we've ran into issues with self-hosted runners filling up /tmp and failing. I conjecture that this is purely because we were running things bare metal. If use Docker, however, garbage collection seems to be performed automatically by default, so this should be a non-issue. (In fact; I want to move the evaluation CI to run inside of Docker exactly because the isolation will fix these reliability issues we've had, but that's a separate issue).
Alternatively, we could out-source the runner maintenance to namespace.so. They have much beefier machines available, and their documentation suggest they have sensible support for local caching, including proper support for cache mounts! 😍
A common strategy to improve your build performance is the adoption of build cache mounts, allowing you to persist cache data across container builds and reduce build times significantly.
With Namespace, your builds run in a shared, remote, high-performance builder by default - unlike on GitHub runners, where your docker builds run inside the runner. This sharing means that you’ll see more cache hits across different builds, and hence faster builds.
I did some testing in #1613, and without caching a fresh build of our docker image took about 5.5 to 6 minutes, including the time needed to upload the image to ghcr.io. More precisely, it built a Docker image for the default build target in about 5,5 - 6 minutes. This is about as fast as the current GH-runner non-docker incremental build for this recent CIRCT commit to main, and even a decent bit faster than the incremental build for a larger change, such as the lemma -> theorem refactor, which took 10 minutes to build. It is of course much faster than the current Docker build, which when running via GH runner takes about 20 to 30 minutes.
As mentioned, that is without caching, so that is just the result of having much beefier machines (these tests were run with the default profile, which has 4vCPU and 16GB RAM), but giving aforementioned docs, caching should work, making further builds that much faster.
With some back-of-the-napkin math, I guesstimate that using the namespace.so runner just to build the core docker image (and keeping everything else running on GH action runners) would cost us somewhere between 25 to 50 dollars per month on the pay-as-you-go plan