Problem
smartmon.sh incorrectly reports smartmon_device_smart_healthy=0 (FAILED) for healthy NVMe devices. This causes false alerts in monitoring systems (e.g., Grafana alerting on smartmon_device_smart_healthy < 1).
Additionally, smartmon_device_smart_available and smartmon_device_smart_enabled are always 0 for NVMe devices, even though SMART is mandatory per the NVMe specification.
Root Cause
Two issues in the parse_smartctl_info() function:
1. local -i smart_healthy= initializes to 0, not empty string
https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/blob/master/smartmon.sh#L112
local -i smart_available=0 smart_enabled=0 smart_healthy=
The -i (integer) flag causes smart_healthy= to initialize to 0 instead of an empty string. The guard at the end of the function is intended to skip output when the health status is unknown:
[[ "${smart_healthy}" != "" ]] && echo "device_smart_healthy{...} ${smart_healthy}"
But since "0" != "" is always true, the guard never works — the metric is always emitted, and if smartctl fails to return the health line for any transient reason, the metric reports 0 (unhealthy) instead of being omitted.
Verified in bash:
$ f() { local -i x=; echo "[$x]"; [[ "$x" != "" ]] && echo "not empty"; }; f
[0]
not empty
2. NVMe devices lack SMART support is: output lines
For SATA devices, smartctl -i -H outputs:
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
For NVMe devices, these lines do not exist — NVMe has a different output format. As a result, smart_available and smart_enabled remain at their default value of 0.
However, SMART / Health Information (Log Identifier 02h) is mandatory for all NVMe I/O controllers per NVM Express Base Specification (revision 2.0a, Section 3.1.2.1.2, Figure 24):
| Log Page Name |
Support Requirements |
| SMART / Health Information (Controller scope) |
M (Mandatory) |
Environment
- Proxmox VE 8.2.4
- smartctl 7.3 (2022-02-28)
- Samsung SSD 980 1TB NVMe (
smartctl -H reports PASSED, 0 errors, 100% spare)
smartmon.sh from prometheus-node-exporter-collectors Debian package (same code as upstream master)
Problem
smartmon.shincorrectly reportssmartmon_device_smart_healthy=0(FAILED) for healthy NVMe devices. This causes false alerts in monitoring systems (e.g., Grafana alerting onsmartmon_device_smart_healthy < 1).Additionally,
smartmon_device_smart_availableandsmartmon_device_smart_enabledare always0for NVMe devices, even though SMART is mandatory per the NVMe specification.Root Cause
Two issues in the
parse_smartctl_info()function:1.
local -i smart_healthy=initializes to0, not empty stringhttps://github.com/prometheus-community/node-exporter-textfile-collector-scripts/blob/master/smartmon.sh#L112
local -i smart_available=0 smart_enabled=0 smart_healthy=The
-i(integer) flag causessmart_healthy=to initialize to0instead of an empty string. The guard at the end of the function is intended to skip output when the health status is unknown:But since
"0" != ""is always true, the guard never works — the metric is always emitted, and ifsmartctlfails to return the health line for any transient reason, the metric reports0(unhealthy) instead of being omitted.Verified in bash:
2. NVMe devices lack
SMART support is:output linesFor SATA devices,
smartctl -i -Houtputs:For NVMe devices, these lines do not exist — NVMe has a different output format. As a result,
smart_availableandsmart_enabledremain at their default value of0.However, SMART / Health Information (Log Identifier 02h) is mandatory for all NVMe I/O controllers per NVM Express Base Specification (revision 2.0a, Section 3.1.2.1.2, Figure 24):
Environment
smartctl -Hreports PASSED, 0 errors, 100% spare)smartmon.shfromprometheus-node-exporter-collectorsDebian package (same code as upstream master)