-
Notifications
You must be signed in to change notification settings - Fork 444
vgpu-manager: enable kernel module configuration via KernelModuleConfig #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8468f59 to
aa1a34d
Compare
aa1a34d to
a3d34df
Compare
2a41292 to
280c546
Compare
280c546 to
dae7499
Compare
dae7499 to
1ceaafa
Compare
|
@shivakunv Could you rebase this PR? |
1ceaafa to
b04453a
Compare
done |
Signed-off-by: Shiva Kumar (SW-CLOUD) <shivaku@nvidia.com>
b04453a to
422e2ff
Compare
| if config.VGPUManager.KernelModuleConfig != nil && config.VGPUManager.KernelModuleConfig.Name != "" { | ||
| // note: transformVGPUManagerContainer() will have already created a Volume backed by the ConfigMap. | ||
| // Only add a VolumeMount for nvidia-vgpu-manager-ctr. | ||
| volumeMounts, _, err := createConfigMapVolumeMounts(n, config.VGPUManager.KernelModuleConfig.Name, driversDir) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create ConfigMap VolumeMounts for vGPU manager kernel module configuration: %w", err) | ||
| } | ||
| obj.Spec.Template.Spec.Containers[i].VolumeMounts = append(obj.Spec.Template.Spec.Containers[i].VolumeMounts, volumeMounts...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed. The transformPeerMemoryContainer() function is only relevant for the main driver daemonset, not the vGPU manager daemonset. The vGPU manager daemonset does not install the nvidia-peermem module.
| if config.VGPUManager.KernelModuleConfig != nil && config.VGPUManager.KernelModuleConfig.Name != "" { | |
| // note: transformVGPUManagerContainer() will have already created a Volume backed by the ConfigMap. | |
| // Only add a VolumeMount for nvidia-vgpu-manager-ctr. | |
| volumeMounts, _, err := createConfigMapVolumeMounts(n, config.VGPUManager.KernelModuleConfig.Name, driversDir) | |
| if err != nil { | |
| return fmt.Errorf("failed to create ConfigMap VolumeMounts for vGPU manager kernel module configuration: %w", err) | |
| } | |
| obj.Spec.Template.Spec.Containers[i].VolumeMounts = append(obj.Spec.Template.Spec.Containers[i].VolumeMounts, volumeMounts...) | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some unit tests to transforms_test.go for TransformVGPUManager()? One of the unit tests can verify that the kernel module config map is getting rendered correctly.
VF-8 Ability to set vGPU host driver options
vGPU host driver options are set via nvidia.ko kernel module parameters in /etc/modprobe.d/nvidia.conf. For example:
options nvidia NVreg_RegistryDwords="RmPVMRL=value"
GPU Operator currently supports setting of kernel module parameters for NVIDIA guest drivers via a module.conf file that's passed to GPU Operator via a ConfigMap.
A similar mechanism will be used to support module parameters for the vGPU host kernel driver.
testing