Skip to content

redhat-et/GKM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-kernel-manager-operator

gkm

Description

GPU Kernel Manager (GKM) is a Kubernetes Operator that propagates GPU Kernel Caches across Kubernetes Nodes, speeding up the startup time of pods in Kubernetes clusters using GPU Kernels.

But what is a GPU Kernel Cache?

A GPU Kernel is the binary that is ultimately loaded on a GPU for execution. Many frameworks like PyTorch run through multiple stages during the compilation of a GPU Kernel. One of the last steps is a Just-in-time (JIT) compilation where code is compiled into GPU-specific machine code at runtime, rather than ahead of time. This allows for runtime optimizations based on the specific detected hardware and workload, creating highly specialized kernels that can be faster than pre-compiled, general-purpose ones. This produces hardware specific kernels at the cost of runtime compilation time. However, once the JIT compile has completed, the generated binary along with the pre-stage artifacts are present in a local directory known as the GPU Kernel Cache.

This is where GKM comes in. This directory containing GPU Kernel Cache can be packaged into an OCI Image using the utilities developed in MCV and pushed to an image repository. GKM pulls down the generated OCI Image from the repository and extracts it into a PersistentVolumeCache (PVC). The PVC can then be volume mounted as a directory in a workload pod. As long as the node has the same GPU as the extracted GPU Kernel Cache was generated on, the the workload is none the wiser and skips the JIT compilation, decreasing the pod start-up time by up to half.

Getting Started

If you already know what GKM is and how it works and just want to deploy it, visit the Getting Started Guide.

GKM Overview

To use GKM, the first step is to generate an OCI Image that contains a GPU Kernel Cache. GKM has a local tool called MCV. MCV can be used to generate and extract the Kernel GPU Cache to and from an OCI Image. The extraction is handle automatically by GKM, but currently the generation is manual. To generate, cd into project directory and provide the mcv tool with the OCI Image name and directory the GPU Kernel Cache is located. GKM has some sample GPU Kernel caches in the ./mcv/example/ directory that can be used to experiment with GKM. This example also pushed the OCI Image to a repository and uses cosign to sign the OCI Image. Signing the image is option but recommended in production. See MCV for details on packaging OCI Images and cosign.

cd $USER_SRC_DIR/GKM/
mcv -c -i quay.io/$QUAY_USER/vector-add-cache:rocm -d ./mcv/example/vector-add-cache-rocm
podman push quay.io/$QUAY_USER/vector-add-cache:rocm
cosign sign -y quay.io/$QUAY_USER/vector-add-cache@sha256:<digest>

Next, create a GKMCache CR that will reference the desired OCI Image (there are several more sample GKMCache and ClusterCache CRs in ./examples/):

apiVersion: gkm.io/v1alpha1
kind: GKMCache
metadata:
  name: vector-add-cache-rocm-v2-rox
  namespace: gkm-test-ns-rox-1
spec:
  image: quay.io/$QUAY_USER/cache-examples:vector-add-cache:rocm

GKM will extract the OCI Image into a PVC that can then be mounted as a volume in a Pod. Here is a sample Pod spec. Notice the PVC Claim contains the name of the GKMCache CR.

kind: Pod
apiVersion: v1
metadata:
  name: gkm-test-ns-rox-pod-1
  namespace: gkm-test-ns-rox-1
spec:
  containers:
    - name: test
      image: quay.io/fedora/fedora-minimal
      imagePullPolicy: IfNotPresent
      command: [sleep, 365d]
      volumeMounts:
        - name: kernel-volume
          mountPath: /cache
  volumes:
    - name: kernel-volume
      persistentVolumeClaim:
        claimName: vector-add-cache-rocm-v2-rox

When the pod comes up, the extracted GPU Kernel Cache will be in the directory specified by the Volume Mount. Adjust the mountPath: as needed by the application running in the pod.

This is a simple example. See Deployment Options for details on how to customize the deployment to meet your needs.

Documentation

Below are links to documents the go into more depth about different GKM based topics:

  • Getting Started Guide: List of prerequisites to build, instructions on building GKM and description of how to deploy GKM.
  • Deployment Options: Details on the GKM Custom Resource Definitions, and how to tailor the different optional fields for different environments.
  • MCV Overview: Overview of Model Cache Vault (MCV). List of prerequisites to build, instructions on building MCV and usage guide. Also contains description on how to use cosign for signing OCI Image for supply chain security.

Below are links to documents on more advanced topics:

  • GKM Architecture: GKM Design document. NOTE: Still a work in progress, references the original design using CSI Driver to mount GPU Kernel Cache instead of the new design of leveraging PVCs.
  • Security: GKM Security model
  • Kyverno Integration: Image signature verification with Kyverno, a third party webhook used by GKM to verify image signing.
  • Webhook Configuration: GKM webhook configuration details.

About

GPU Kernel Manager Operator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors