-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Hello.
I successfully provisioned the ubuntu 22.04 base image to the GPU node, but cuda and ofed are not installed.
software_config.json detail
{
"cluster_os_type": "ubuntu",
"cluster_os_version": "22.04",
"repo_config": "always",
"softwares": [
{"name": "openldap"},
{"name": "nfs"},
{"name": "k8s", "version":"1.29.5"},
{"name": "cuda", "version":"12.2.2"},
{"name": "ofed", "version":"5.8-6.0.4.2"},
{"name": "jupyter"},
{"name": "pytorch"},
{"name": "tensorflow"}
],
"pytorch": [
{"name": "pytorch_cpu"},
{"name": "pytorch_nvidia"}
],
"tensorflow": [
{"name": "tensorflow_cpu"},
{"name": "tensorflow_nvidia"}
]
}
cuda.json detail
{
"cuda": {
"cluster": [
{ "package": "cuda",
"type": "iso",
"url": "https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-535.104.05-1_amd64.deb",
"path": ""
}
]
}
}
ofed.json detail
{
"ofed": {
"cluster": [
{ "package": "ofed",
"type": "iso",
"url": "https://content.mellanox.com/ofed/MLNX_OFED-5.8-6.0.4.2/MLNX_OFED_LINUX-5.8-6.0.4.2-ubuntu22.04-x86_64.iso",
"path": ""
}
]
}
}
The cuda-repo file and ofed iso file were downloaded properly to the /opt/omnia_repo/cluster/ubuntu/22.04/ directory, but they are not automatically installed during provisioning. Is there anything I need to configure additionally?