-
Notifications
You must be signed in to change notification settings - Fork 406
Description
Hi @tariq1890 ,
We are seeing MIG issue with 580.65.06, the MIG enablement is stuck in pending state and MIG Manager is failing to apply the changes. This issue we are specifically observing with A100 GPU's, where as everything is working fine with H100 and H200
MIG Manager log:-
"2025-10-30T07:33:20Z" level=fatal msg="Error applying MIG configuration with hooks: unable to apply MIG config with MIG mode disabled"
nvidia-mig-manager time="2025-10-30T07:33:20Z" level=info msg="Restarting any GPU clients previously shutdown in Kubernetes by reenabling their component-specific nodeSelector labels"
nvidia-mig-manager time="2025-10-30T07:33:20Z" level=info msg="Changing the 'nvidia.com/mig.config.state' node label to 'failed'\n"
nvidia-mig-manager time="2025-10-30T07:33:20Z" level=error msg="Error: failed to apply MIG configuration: exit status 1"
nvidia-mig-manager time="2025-10-30T07:33:20Z" level=info msg="Waiting for change to 'nvidia.com/mig.config' label"
Log : -
[root@nvidia-driver-daemonset-m4wl8 drivers]# nvidia-smi -mig 1
Warning: MIG mode is in pending enable state for GPU 00000001:00:00.0:In use by another client 00000001:00:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi).
Please first kill all processes using the device and retry the command or reboot the system to make MIG mode effective. Warning: MIG mode is in pending enable state for GPU 00000002:00:00.0:In use by another client 00000002:00:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using the device and retry the command or reboot the system to make MIG mode effective.
Warning: MIG mode is in pending enable state for GPU 00000003:00:00.0:In use by another client 00000003:00:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using the device and retry the command or reboot the system to make MIG mode effective.
Warning: MIG mode is in pending enable state for GPU 00000004:00:00.0:In use by another client 00000004:00:00.0 is currently being used by one or more other processes (e.g. CUDA application or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using the device and retry the command or reboot the system to make MIG mode effective. All done.
