The GPU Kernel Manager (GKM) enforces a multi-layered security model to ensure that only trusted, signed GPU kernel cache images are used, that workloads consume validated resources, and that system components maintain strong ownership boundaries.
To guarantee the integrity and provenance of kernel cache images, GKM uses a cosign-based signature verification pipeline enforced via Kubernetes admission and controller logic.
Note: Integration with projects like Kyverno will be considered in the future when it supports image digest mutation for Custom Resource Objects.
| Component | Responsibilities |
|---|---|
| Admission Webhook (Combined) | - Verifies signatures - Resolves image tag to digest - Mutates trusted annotation - Validates annotation tampering |
| GKM Operator | - Promotes trusted digest to .status.resolvedDigest- Sets .status.lastUpdated |
| GKM Agent | - Watches CR gkm.io/resolvedDigest annotations. - Pulls image by digest - Validates compatibility against GPU Hardware - Updates GKMCacheNode status |
User creates or updates a GKMCache or ClusterGKMCache with a new spec.image
(e.g., quay.io/cache:latest)
This webhook operation is broken down into two phases:
-
Phase 1 - Mutation (on CREATE or UPDATE):
-
Resolves the tag to a content digest (
sha256:...). -
Verifies the image signature using cosign.
-
Adds a protected annotation to the CR:
metadata: annotations: gkm.io/resolvedDigest: sha256:abc123...
Note: This annotation is considered trusted system state. The Validating Admission Webhook must be configured to reject any attempt by a user to add, modify, or remove this annotation. Only the GKM webhook or controller is authorized to manage this field.
-
-
Phase 2 - Validation: Denies the request if:
-
The image is not signed or verification fails
-
The user attempts to:
- Modify the
gkm.io/resolvedDigestannotation directly
Note: Reminder only trusted service accounts (the webhook itself) can mutate this annotation.
- Modify the
-
The GKM Operator:
-
Watches
GKMCacheandClusterGKMCacheCRs for new or changedspec.imageor annotations. -
When it finds a trusted annotation, it updates the CR
status:status: resolvedDigest: sha256:abc123... lastUpdated: "2025-07-24T14:22:00Z"
-
The operator does not reverify the signature.
Note: The operator treats the
gkm.io/resolvedDigestannotation as the trusted source of truth and promotes it to.status.resolvedDigest. This enables status tracking, audit, and observability. The annotation is used for faster downstream consumption by agents, while.statusserves as a system record.
The GKM Agent (on each node):
- Also watches
GKMCacheandClusterGKMCacheCRs for new or changedspec.imageor annotations. - When it finds a trusted annotation, it:
- Pulls the image by digest only (never by tag) and does not run any further reverification tests.
- Extracts kernel cache and performs GPU/driver compatibility checks.
- Updates the local
GKMCacheNodeorClusterGKMCacheNodeCR with per-node status, including:- Compatible GPU IDs
- Incompatibility reasons (if any)
- Last updated timestamp
Note: The agent does not reverify the signature.
Note: On CR update, the agent checks if the Cache is already extracted on the node, and whether or not it's in use. If a cache is already in use, the Agent might need to track the usage of the old cache and do some form of garbage collection when it's no longer in use.
-
User applies:
apiVersion: gkm.io/v1alpha1 kind: GKMCache # Or ClusterGKMCache metadata: name: llama2 namespace: ml-apps spec: image: quay.io/org/llama2:latest
-
Webhook Verifies and adds:
metadata: annotations: gkm.io/resolvedDigest: sha256:abc123...
-
Operator Promotes:
status: resolvedDigest: sha256:abc123... lastUpdated: "2025-07-24T15:00:00Z"
-
Agent:
- Pulls and validates the image by digest from the annotation in the CR.
- Updates node-level cache status in
GKMCacheNodeorClusterGKMCacheNode.
| Field | Who Can Write It | Enforced By |
|---|---|---|
metadata.annotations["gkm.io/resolvedDigest"] |
Only the Admission Webhook | Validating logic inside webhook |
status.resolvedDigest |
Only the GKM Operator | RBAC and controller logic |
spec.image |
User-provided input | Mutating webhook handles digest resolution |
Note: Even though agents read the
gkm.io/resolvedDigestannotation for fast reaction, it is still treated as a trusted field and must be immutable by users. Webhook enforcement is required to maintain this trust boundary.
- Digest is verified and resolved at admission time
- Spec remains user-controlled, but cannot override digest once resolved
- Runtime components pull by digest only
- Digest is promoted to
.statusby a trusted controller - Nodes act on immutable, validated digests
This security model assumes:
- The GKM webhook is trusted to verify image signatures and resolve image tags to digests.
- The image registry is trusted to serve content correctly by digest (i.e., content-addressable integrity).
- The digest is immutable and is treated as an attestation of verified content.
- The GKM Operator is trusted to promote verified state from annotation
to
.status, and does not reverify signatures. - The Validating Admission Webhook is critical — if missing or misconfigured, users could bypass digest verification by injecting annotations.
Note: The webhook must explicitly check
userInfoin admission requests to prevent unauthorized entities from modifying trusted annotations. RBAC alone is insufficient — webhook logic is needed to enforce write restrictions on annotations likegkm.io/resolvedDigest.Note: GKM Agents do not reverify image signatures post-pull. This is consistent with common Kubernetes security patterns (e.g., Kyverno, GKE Flux, Tekton Chains), where verification happens once at admission and runtime behaviour is locked to verified digests.