A Model/GPU kernel cache container packaging utility inspired by WASM.
- Build container images containing GPU Kernel/Model caches.
- Extract a cache from an OCI image
- Compatible with docker or buildah
- Client API for retrieving and extracting images
- Artifact and image signing via cosign (indirectly)
- Cache artifact signing with Cosign
- Container image signing support with Cosign
- Go 1.25.0 or later
sudo dnf install gpgme-devel
sudo dnf install btrfs-progs-develBuild the binary:
make buildAfter the binary is built, it can be found in an arch specific directory,
something like ./_output/bin/linux_amd64/mcv. To install the binary in
the local ~/go/bin directory, run (make sure ~/go/bin is in $PATH):
make installBelow is the mcv usage:
$ mcv -h
A Model cache container image management utility
Usage:
mcv [flags]
Flags:
-b, --baremetal Run baremetal preflight checks
-c, --create Create OCI image
-d, --dir string A Cache Directory
-e, --extract Extract a cache from an OCI image
-h, --help help for mcv
-i, --image string OCI image name
-l, --log-level string Set the logging verbosity level:
debug, info, warning or error
--no-gpu Allow kernel extraction without GPU
present (for testing purposes)NOTE: The create option is a work in progress. For now to create an OCI image containing a GPU Kernel cache directory please follow the instructions in spec-compat.md.
The Cache Image specification defines how to bundle caches as container images. A compatible Cache image consists of cache directory for a Triton Kernel/vLLM model. The details can be found in spec-compat.md
MCV supports both legacy (triton cache) and new (binary cache) vLLM formats:
- vLLM Triton Cache Format (legacy) - Stores
triton_cache/andinductor_cache/inside rank directories - vLLM Binary Cache Format (new) - Stores prefix directories
(e.g.,
backbone/) inside rank directories
For detailed information about vLLM binary cache support, see: vllm-binary-cache.md
To extract the Triton Cache for the 01-vector-add.py tutorial from Triton, run the following:
mcv -e -i quay.io/gkm/vector-add-cache:rocm
Img fetched successfully!!!!!!!!
Img Digest: sha256:b6d7703261642df0bf95175a64a01548eb4baf265c5755c30ede0fea03cd5d97
Img Size: 525
bash-4.4#This will extract the cache directory from the
quay.io/gkm/vector-add-cache:rocm container image and copy it to
~/.triton/cache/.
To Create an OCI image for a Triton Cache using docker run the following:
mcv -c -i quay.io/gkm/vector-add-cache:rocm -d example/vector-add-cache-rocm
INFO[2025-05-28 11:09:33] baremetalFlag false
INFO[2025-05-28 11:09:33] Using docker to build the image
INFO[2025-05-28 11:09:33] Wrote manifest to /tmp/.mcv/io.triton.manifest/manifest.json
INFO[2025-05-28 11:09:33] Dockerfile generated successfully at /tmp/.mcv/Dockerfile
{"stream":"Step 1/7 : FROM scratch"}
{"stream":"\n"}
{"stream":" ---\u003e \n"}
{"stream":"Step 2/7 : LABEL org.opencontainers.image.title=vector-add-cache"}
{"stream":"\n"}
{"stream":" ---\u003e Running in fe2ead6429b6\n"}
{"stream":" ---\u003e 729d29d1eab1\n"}
{"stream":"Step 3/7 : COPY \"io.triton.cache/.\" ./io.triton.cache/"}
{"stream":"\n"}
{"stream":" ---\u003e e806e7b6def5\n"}
{"stream":"Step 4/7 : COPY \"io.triton.manifest/manifest.json\" ./io.triton.manifest/manifest.json"}
{"stream":"\n"}
{"stream":" ---\u003e 68cbd692cef2\n"}
{"stream":"Step 5/7 : LABEL cache.triton.image/cache-size-bytes=80415"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 3f80c2250345\n"}
{"stream":" ---\u003e 7eb9a67e385e\n"}
{"stream":"Step 6/7 : LABEL cache.triton.image/entry-count=1"}
{"stream":"\n"}
{"stream":" ---\u003e Running in da8a4e1461d0\n"}
{"stream":" ---\u003e 9f0b331cc5be\n"}
{"stream":"Step 7/7 : LABEL cache.triton.image/summary={\"targets\":[{\"backend\":\"hip\",\"arch\":\"gfx90a\",\"warp_size\":64}]}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 4772e3c43256\n"}
{"stream":" ---\u003e c144a34c9296\n"}
{"aux":{"ID":"sha256:c144a34c9296b2a7ec322e041bf9fb8e29111ba9d071ef5e6e31618fb2e528e4"}}
{"stream":"Successfully built c144a34c9296\n"}
{"stream":"Successfully tagged quay.io/gkm/vector-add-cache:rocm\n"}
INFO[2025-05-28 11:09:34] Temporary directories successfully deleted.
INFO[2025-05-28 11:09:34] Docker image built successfully
INFO[2025-05-28 11:09:34] OCI image created successfully.To see the new image:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/tkm/vector-add-cache latest 32572653bbbd 5 minutes ago 0BTo inspect the docker image with Skopeo
skopeo inspect docker-daemon:quay.io/gkm/vector-add-cache:rocm
{
"Name": "quay.io/tkm/vector-add-cache",
"Digest": "sha256:326c155c3d6de9d285b0280d141cb75f1fd48475b410e19f4aef0b395b400ced",
"RepoTags": [],
"Created": "2025-05-28T15:09:34.032834736Z",
"DockerVersion": "28.1.1",
"Labels": {
"cache.triton.image/cache-size-bytes": "80415",
"cache.triton.image/entry-count": "1",
"cache.triton.image/summary": "{\"targets\":[{\"backend\":\"hip\",\"arch\":\"gfx90a\",\"warp_size\":64}]}",
"org.opencontainers.image.title": "vector-add-cache"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:fe1632cee6d6de159c5c36233c73fbbaa9196af69d771fe016ae4b3a0b6ea698",
"sha256:a53fd74714b8956ec2d2e02c4c262e6800bc45b0a8f5e339923ea3baa2d1f1ff"
],
"LayersData": [
{
"MIMEType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"Digest": "sha256:fe1632cee6d6de159c5c36233c73fbbaa9196af69d771fe016ae4b3a0b6ea698",
"Size": 91648,
"Annotations": null
},
{
"MIMEType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"Digest": "sha256:a53fd74714b8956ec2d2e02c4c262e6800bc45b0a8f5e339923ea3baa2d1f1ff",
"Size": 2560,
"Annotations": null
}
],
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
}Note: If
buildahis installed it will be favoured to build the image. The build output is shown below.
mcv -c -i quay.io/gkm/vector-add-cache:rocm -d example/vector-add-cache-rocm
INFO[2025-05-28 12:23:04] baremetalFlag false
INFO[2025-05-28 12:23:04] Using buildah to build the image
INFO[2025-05-28 12:23:04] Wrote manifest to /tmp/buildah-manifest-dir-2780945232/manifest.json
INFO[2025-05-28 12:23:04] Image built! baadff55392c0ada6f0d358c255d63ca770fb20b87429a732480e00bbf8d044b
INFO[2025-05-28 12:23:04] Temporary directories successfully deleted.
INFO[2025-05-28 12:23:04] OCI image created successfully.To inspect the buildah image with Skopeo
skopeo inspect containers-storage:quay.io/gkm/vector-add-cache:rocm
{
"Name": "quay.io/tkm/vector-add-cache",
"Digest": "sha256:3f8c7b3aeeffd9ee3f673486f3bc681a7f9ed39e21242628e6845755191d6bd4",
"RepoTags": [],
"Created": "2025-05-28T15:45:17.379786001Z",
"DockerVersion": "",
"Labels": {
"cache.triton.image/cache-size-bytes": "80415",
"cache.triton.image/entry-count": "1",
"cache.triton.image/summary": "{\"targets\":[{\"backend\":\"hip\",\"arch\":\"gfx90a\",\"warp_size\":64}]}"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:ef89050f71ecc3dc925f14c12d2fd406c067f78987eed36a1176b19499c8ea20"
],
"LayersData": [
{
"MIMEType": "application/vnd.oci.image.layer.v1.tar",
"Digest": "sha256:ef89050f71ecc3dc925f14c12d2fd406c067f78987eed36a1176b19499c8ea20",
"Size": 93184,
"Annotations": null
}
],
"Env": null
}To inspect the image labels specifically run:
skopeo inspect containers-storage:quay.io/gkm/vector-add-cache:rocm | jq -r '.Labels["cache.triton.image/summary"]' | jq .
{
"targets": [
{
"backend": "hip",
"arch": "gfx90a",
"warp_size": 64
}
]
}To Create an OCI image for a vLLM Cache run the following:
mcv -c -i quay.io/gkm/cache-examples:vllm-example -d example/vllm-cache
INFO[2025-09-03 09:04:15] Hardware accelerator(s) detected (2). GPU support enabled.
INFO[2025-09-03 09:04:15] Using buildah to build the image
INFO[2025-09-03 09:04:23] Detected cache components: [vllm]
INFO[2025-09-03 09:04:24] Image built! 8218fac0225882a7de7a1f11f32aff25df2936f1f12b08c0c26ab30897d19c5a
INFO[2025-09-03 09:04:24] OCI image created successfully.To inspect the image labels specifically run:
skopeo inspect containers-storage:quay.io/gkm/cache-examples:vllm-example
{
"Name": "quay.io/gkm/cache-examples",
"Digest": "sha256:9e731d58adccd608cb18dcefe259acd30ffe976d5e98208a4158ce22c0b5d1e2",
"RepoTags": [],
"Created": "2026-02-10T12:04:38.260317569Z",
"DockerVersion": "",
"Labels": {
"cache.vllm.image/cache-size-bytes": "2269180",
"cache.vllm.image/entry-count": "1",
"cache.vllm.image/summary": "{\"targets\":[{\"backend\":\"hip\",\"arch\":\"gfx90a\",\"warp_size\":64}]}"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:440b5cbd3b76dc17a6012e17fc56341d4894b88ab7a85b12c5e2f6f7c4b80661"
],
"LayersData": [
{
"MIMEType": "application/vnd.oci.image.layer.v1.tar+gzip",
"Digest": "sha256:440b5cbd3b76dc17a6012e17fc56341d4894b88ab7a85b12c5e2f6f7c4b80661",
"Size": 250291,
"Annotations": null
}
],
"Env": null
}To extract the vLLM Cache run the following:
mcv -e -i quay.io/gkm/cache-examples:vllm-example
INFO[2025-09-03 09:06:00] Hardware accelerator(s) detected (2). GPU support enabled.
INFO[2025-09-03 09:06:02] Preflight GPU compatibility check passed.
INFO[2025-09-03 09:06:02] Preflight completed matched="[0 1]" unmatched="[]"
INFO[2025-09-03 09:06:04] Extracting cache to directory: /home/fedora/.cache/vllmUse Sigstore Cosign to sign mcv-built images.
- Install Cosign
go install github.com/sigstore/cosign/v2/cmd/cosign@latest- Sign an image
cosign sign -y quay.io/tkm/vector-add-cache@sha256:<digest>
⏎
Generating ephemeral keys...
Retrieving signed certificate...
The sigstore service, hosted by sigstore a Series of LF Projects,
LLC, is provided pursuant to the Hosted Project Tools Terms of
Use, available at
https://lfprojects.org/policies/hosted-project-tools-terms-of-use/.
Note that if your submission includes personal data associated with
this signed artifact, it will be part of an immutable record.
This may include the email address associated with the account with
which you authenticate your contractual Agreement.
This information will be used for signing this artifact and will be
stored in public transparency logs and cannot be removed later, and
is subject to the Immutable Record notice at
https://lfprojects.org/policies/hosted-project-tools-immutable-records/.
By typing 'y', you attest that (1) you are not submitting the personal
data of any other person; and (2) you understand and agree to the
statement and the Agreement terms at the URLs listed above.
Your browser will now be opened to:
...Cosign will prompt you to authenticate and display legal terms regarding transparency logs.
- Confirm and Finish
- Ephemeral keys will be generated
- Signature will be pushed to the registry
- You'll see a success message including the transparency log index
Upon successful completion, you will see an output similar to:
Successfully verified SCT...
tlog entry created with index: 215011903
Pushing signature to: quay.io/gkm/cache-examplesAn example snippet of how to use the client API to extract a Cache from a container image is shown below.
import (
"github.com/redhat-et/GKM/mcv/pkg/client"
)
package main
import (
"github.com/redhat-et/GKM/mcv/pkg/client"
)
func main() {
err := client.ExtractCache(client.Options{
ImageName: "quay.io/gkm/cache-examples:vector-add-cache-cuda",
CacheDir: "/tmp/testcache",
LogLevel: "debug",
EnableBaremetal: nil, // or false if explicitly desired
})
if err != nil {
panic(err)
}
}You can also use the MCV client API to retrieve details about the system's available GPUs:
package main
import (
"encoding/json"
"fmt"
"log"
"github.com/redhat-et/GKM/mcv/pkg/client"
)
func main() {
stub := false
gpus, err := client.GetSystemGPUInfo(client.HwOptions{EnableStub: &stub})
if err != nil && gpus == nil {
log.Fatalf("Error retrieving GPU info: %v", err)
}
output, err := json.MarshalIndent(gpus, "", " ")
if err != nil {
log.Fatalf("Failed to format GPU info: %v", err)
}
fmt.Println("Detected GPU Devices:")
fmt.Println(string(output))
}package main
import (
"fmt"
"log"
"github.com/redhat-et/GKM/mcv/pkg/client"
)
func main() {
matched, unmatched, err := client.PreflightCheck(
"quay.io/gkm/cache-examples:vector-add-cache-cuda")
if err != nil {
log.Fatalf("Preflight check failed: %v", err)
}
fmt.Printf("Compatible GPUs: %d\n", len(matched))
for i, gpu := range matched {
fmt.Printf(" MATCH %d: Backend=%s, Arch=%s, WarpSize=%d, "+
"PTX=%s\n", i, gpu.Backend, gpu.Arch, gpu.WarpSize,
gpu.PTXVersion)
}
fmt.Printf("Incompatible GPUs: %d\n", len(unmatched))
for i, gpu := range unmatched {
fmt.Printf(" NO-MATCH %d: Backend=%s, Arch=%s, WarpSize=%d, "+
"PTX=%s\n", i, gpu.Backend, gpu.Arch, gpu.WarpSize,
gpu.PTXVersion)
}
}MCV supports running in environments without GPUs by using a static device configuration. This is useful for testing or CI environments.
Run MCV with the --stub flag. It will use the static config and behave as
if those devices are present.
MCV provides a container image called quay.io/gkm/mcv. This image can be
used to wrap a vLLM/Triton cache in an OCI container image that can then be
pushed to a container registry (without having to install mcv locally). This
image can also be used as part of a
github workflow.
To use docker on the host with an MCV image, you need to mount the cache directory to the container and run the following command:
docker run --rm -it --privileged \
-v <path-to-cache>/example:/example \
quay.io/gkm/mcv bash -lc '
/mcv -c -i quay.io/gkm/vector-add-cache:rocm \
-d /example/vector-add-cache-rocm &&
buildah push containers-storage:quay.io/gkm/vector-add-cache:rocm \
docker-archive:/example/vector-add-cache-rocm.tar:quay.io/gkm/vector-add-cache:rocm
'
INFO[2025-09-11 16:46:54] Setting log level: info
INFO[2025-09-11 16:46:54] Using buildah to build the image
INFO[2025-09-11 16:46:54] Detected cache components: [triton]
INFO[2025-09-11 16:46:55] Image built! 8ce4bc2e98abfa8c0a5a6f6046c1c7bc8ac09805ecb029427a995dc2897828f8
INFO[2025-09-11 16:46:55] OCI image created successfully.
Getting image source signatures
Copying blob 24b82d6fef87 done
Copying config 8ce4bc2e98 done
Writing manifest to image destination
Storing signaturesThen on host:
docker load -i <path-to-cache>/example/vector-add-cache-rocm.tar
24b82d6fef87: Loading layer 93.18kB/93.18kB
The image quay.io/gkm/vector-add-cache:rocm already exists, renaming
the old one with ID sha256:5dc90b88f536e44e186c5a076afbb7a54389aed6f0ddfa21365ae2c7f79cb21d to empty string
Loaded image: quay.io/gkm/vector-add-cache:rocmCheck the images:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/gkm/vector-add-cache rocm 8ce4bc2e98ab 15 seconds ago 80.7kBTo use podman on the host with an MCV image, you need to mount the cache directory to the container and run the following command:
podman run --rm -it --privileged \
-v <path-to-cache>/example:/example \
quay.io/gkm/mcv bash -lc '
/mcv -c -i quay.io/gkm/vector-add-cache:rocm \
-d /example/vector-add-cache-rocm &&
buildah push containers-storage:quay.io/gkm/vector-add-cache:rocm \
oci-archive:/example/vector-add-cache-rocm.oci:quay.io/gkm/vector-add-cache:rocm
'podman load -i <path-to-cache>/example/vector-add-cache-rocm.ocipodman images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/gkm/vector-add-cache rocm b1bc2ae6bef1 25 seconds ago 94.7 kB