-
Notifications
You must be signed in to change notification settings - Fork 406
Open
Description
Describe the bug
The following ccManager related section of clusterpolicy for NVIDIA GPU Operator 25.3.4 does not enable confidential computing for H200 GPUs (with VBIOS 96.00.d9.00.02, ID=0x2335) on OpenShift 4.19. It works fine with H100 GPUs in the similar environment on another node.
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: "gpu-cluster-policy"
spec:
ccManager:
defaultMode: "on"
enabled: true
env:
- name: CC_CAPABLE_DEVICE_IDS
value: 0x2335,0x2330,0x2331,0x2322
image: k8s-cc-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.1.1
To Reproduce
After installing the GPU operator the cluster policy with the above ccManager section is applied. The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode says "CC mode is off" for all the GPUs.
Expected behavior
The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode should say "CC mode is on" for all the GPUs.
Metadata
Metadata
Assignees
Labels
No labels