Design questions around caching in HLB

There are three levels of caching in HLB: Cache mounts, FS caching, Build cache.

Cache mounts
---

Cache mounts is a persistent read-writable mount stored by the BuildKit backend. This is typically used for compiler / package manager caches.

Some caveats:
- Has sharing modes depending on the underlying compiler/package manager. Is concurrent usage safe? It's complicated to know what to pick.
- Ever growing size, how do we prune this? How many projects can we use this with? If its a single cache key for every project, would using `FS caching` be better because we have more control?
- In a cloud cluster environment, this is not very reliable because you may not hit the same `buildkitd`.

```hlb
fs default() {
    image "golang:alpine"
    run "go build xyz" with option {
        mount fs { scratch; } "/root/.cache/go-build" with option {
            cache "someCacheKey" "shared"
        }
    }
}
```

1. I'm not sure if the `scratch` fs defined after `mount` is ever utilized if there is a `cache` option. Need to investigate this.
2. Instead of nesting as a `option::mount cache`, can we define it as a `option::run cacheMount` for UX? Initially it was designed as a mount option because of LLB, but we can change that.
3. In the Dockerfile frontend, I recall that they define cache keys for the user, it's possible we can move the cache key as an option to `cacheMount`. Need to investigate this.
4. Does the BuildKit build cache export cache mounts as well? Need to investigate this.

FS caching
---

Rather than a language or backend feature, this is more of a pattern emerging from HLB usage. You are able to export filesystems to various sources (image to DockerHub, pushing to remote git repo, publishing to HTTP server like artifactory), and you can also use these remote sources to mount as a "starting point" or a "primed mount" to speed up the operation.

When running `npm install`, the current pattern is:
- Mount `option::mount cache` for `npm` cache directories. (Likely safe to be shared with multiple projects)
- Mount `fs { scratch; } "src/node_modules"` because you can also prime the node modules. (Only safe to use for a single project)

And then you can push the node modules mount as an image, then remount it for subsequent runs:

```hlb
fs default(fs src) {
    image "node:alpine"
    run "npm install" with option {
        dir "/src"
        mount src "/src"
        mount fs { scratch; } "/root/.npm" with option {
            cache "npm-config-cache" "shared"
        }
        mount fs { image "my-node-modules:latest";  } "/src/node_modules" as nodeModules
    }
}

fs snapshotModules() {
    nodeModules
    dockerPush "my-node-modules:latest"
}
```

Perhaps users will then set up a CRON CI job to run the target `snapshotModules` once in a while, or on a merge to `master`. 

Build cache
---

Build cache import/export is a native feature to BuildKit. See: https://github.com/moby/buildkit#export-cache

This is the basic behavior you get when you run a second build through HLB on the same BuildKit backend. Unchanged input will return the same output so they don't need to run it a second time. However when you point to a new BuildKit backend this is lost. Build cache import/export allows you to publish the build cache for a particular build to disk (local), inline (if creating a docker image, it will be embedded), or an OCI image (not an executable image, just a data container for the build cache).

There are two main angles we can tackle this from:
- Backend solution: We develop and maintain infrastructure around BuildKit to distribute build caches between `buildkitd` nodes.
- Frontend solution: We expose build cache import/export in HLB.

Build cache import/export is per-solve, and we already do multiple solves when executing HLB. 

If we want to implement this in the frontend side, we'll need to somehow inform the HLB compiler that this section is using this particular cache. Ideally we want to provide a `fs { ... }` to be agonistic to the source (whether its `local` or `image`), but that will require an upstream change.

Here is an example that somewhat fulfills the requirements but I think is a terrible UX. But perhaps can serve as a starting point for discussion:

```hlb
fs default() {
    cacheContext fs {
        image "openllb/my-remote-build-cache:latest";
    } fs {
        image "alpine"
        run "running with cache context"
    }  
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design questions around caching in HLB #58

Cache mounts

FS caching

Build cache

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design questions around caching in HLB #58

Description

Cache mounts

FS caching

Build cache

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions