Skip to content

Design questions around caching in HLB #58

Open
@hinshun

Description

There are three levels of caching in HLB: Cache mounts, FS caching, Build cache.

Cache mounts

Cache mounts is a persistent read-writable mount stored by the BuildKit backend. This is typically used for compiler / package manager caches.

Some caveats:

  • Has sharing modes depending on the underlying compiler/package manager. Is concurrent usage safe? It's complicated to know what to pick.
  • Ever growing size, how do we prune this? How many projects can we use this with? If its a single cache key for every project, would using FS caching be better because we have more control?
  • In a cloud cluster environment, this is not very reliable because you may not hit the same buildkitd.
fs default() {
    image "golang:alpine"
    run "go build xyz" with option {
        mount fs { scratch; } "/root/.cache/go-build" with option {
            cache "someCacheKey" "shared"
        }
    }
}
  1. I'm not sure if the scratch fs defined after mount is ever utilized if there is a cache option. Need to investigate this.
  2. Instead of nesting as a option::mount cache, can we define it as a option::run cacheMount for UX? Initially it was designed as a mount option because of LLB, but we can change that.
  3. In the Dockerfile frontend, I recall that they define cache keys for the user, it's possible we can move the cache key as an option to cacheMount. Need to investigate this.
  4. Does the BuildKit build cache export cache mounts as well? Need to investigate this.

FS caching

Rather than a language or backend feature, this is more of a pattern emerging from HLB usage. You are able to export filesystems to various sources (image to DockerHub, pushing to remote git repo, publishing to HTTP server like artifactory), and you can also use these remote sources to mount as a "starting point" or a "primed mount" to speed up the operation.

When running npm install, the current pattern is:

  • Mount option::mount cache for npm cache directories. (Likely safe to be shared with multiple projects)
  • Mount fs { scratch; } "src/node_modules" because you can also prime the node modules. (Only safe to use for a single project)

And then you can push the node modules mount as an image, then remount it for subsequent runs:

fs default(fs src) {
    image "node:alpine"
    run "npm install" with option {
        dir "/src"
        mount src "/src"
        mount fs { scratch; } "/root/.npm" with option {
            cache "npm-config-cache" "shared"
        }
        mount fs { image "my-node-modules:latest";  } "/src/node_modules" as nodeModules
    }
}

fs snapshotModules() {
    nodeModules
    dockerPush "my-node-modules:latest"
}

Perhaps users will then set up a CRON CI job to run the target snapshotModules once in a while, or on a merge to master.

Build cache

Build cache import/export is a native feature to BuildKit. See: https://github.com/moby/buildkit#export-cache

This is the basic behavior you get when you run a second build through HLB on the same BuildKit backend. Unchanged input will return the same output so they don't need to run it a second time. However when you point to a new BuildKit backend this is lost. Build cache import/export allows you to publish the build cache for a particular build to disk (local), inline (if creating a docker image, it will be embedded), or an OCI image (not an executable image, just a data container for the build cache).

There are two main angles we can tackle this from:

  • Backend solution: We develop and maintain infrastructure around BuildKit to distribute build caches between buildkitd nodes.
  • Frontend solution: We expose build cache import/export in HLB.

Build cache import/export is per-solve, and we already do multiple solves when executing HLB.

If we want to implement this in the frontend side, we'll need to somehow inform the HLB compiler that this section is using this particular cache. Ideally we want to provide a fs { ... } to be agonistic to the source (whether its local or image), but that will require an upstream change.

Here is an example that somewhat fulfills the requirements but I think is a terrible UX. But perhaps can serve as a starting point for discussion:

fs default() {
    cacheContext fs {
        image "openllb/my-remote-build-cache:latest";
    } fs {
        image "alpine"
        run "running with cache context"
    }  
}

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    designDesign for a feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions