Open
Description
kubelet --root-dir=/data/kubelet
rdma-share daemonset yaml file:
volumeMounts:
- name: device-plugin
mountPath: /data/kubelet/device-plugins
readOnly: false
- name: plugins-registry
mountPath: /data/kubelet/plugins_registry
readOnly: false
- name: config
mountPath: /k8s-rdma-shared-dev-plugin
- name: devs
mountPath: /dev/
volumes:
- name: device-plugin
hostPath:
path: /data/kubelet/device-plugins
- name: plugins-registry
hostPath:
path: /data/kubelet/plugins_registry
- name: config
configMap:
name: rdma-devices
items:
- key: config.json
path: config.json
- name: devs
hostPath:
path: /dev/
the rdma shared plugin print logs.
2024/01/18 08:29:00 Initializing resource servers
2024/01/18 08:29:00 Resource: &{ResourceName:hca_shared_devices_a ResourcePrefix:rdma RdmaHcaMax:1000 Devices:[enp88s0] Selectors:{Vendors:[] DeviceIDs:[] Drivers:[] IfNames:[enp88s0] LinkTypes:[]}}
......
2024/01/18 08:29:00 Starting all servers...
2024/01/18 08:29:00 starting rdma/hca_shared_devices_a device plugin endpoint at: hca_shared_devices_a.sock
2024/01/18 08:29:00 Error: starting resource servers listen unix /var/lib/kubelet/device-plugins/hca_shared_devices_a.sock: bind: no such file or directory