You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# DCGM is monitoring/telemetry and does not gate GPU workload scheduling, so start it without
1658
+
# blocking node provisioning and treat a slow/failed start as non-fatal.
1659
+
logs_to_events "AKS.CSE.start.nvidia-dcgm""systemctlEnableAndStartNoBlock nvidia-dcgm 30"||echo"warning: nvidia-dcgm could not be enqueued; GPU monitoring will start asynchronously"
1651
1660
1652
1661
# 3. Start the nvidia-dcgm-exporter service.
1653
1662
# Create systemd drop-in directory for nvidia-dcgm-exporter service
# The exporter is telemetry only and does not gate scheduling, so start it off the critical
1682
+
# path and treat a slow/failed start as non-fatal.
1683
+
logs_to_events "AKS.CSE.start.nvidia-dcgm-exporter""systemctlEnableAndStartNoBlock nvidia-dcgm-exporter 30"||echo"warning: nvidia-dcgm-exporter could not be enqueued; GPU metrics will start asynchronously"
0 commit comments