Skip to content

smartmon.sh: NVMe devices report false smart_healthy=0 due to bash integer initialization bug #262

@moonD4rk

Description

@moonD4rk

Problem

smartmon.sh incorrectly reports smartmon_device_smart_healthy=0 (FAILED) for healthy NVMe devices. This causes false alerts in monitoring systems (e.g., Grafana alerting on smartmon_device_smart_healthy < 1).

Additionally, smartmon_device_smart_available and smartmon_device_smart_enabled are always 0 for NVMe devices, even though SMART is mandatory per the NVMe specification.

Root Cause

Two issues in the parse_smartctl_info() function:

1. local -i smart_healthy= initializes to 0, not empty string

https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/blob/master/smartmon.sh#L112

local -i smart_available=0 smart_enabled=0 smart_healthy=

The -i (integer) flag causes smart_healthy= to initialize to 0 instead of an empty string. The guard at the end of the function is intended to skip output when the health status is unknown:

[[ "${smart_healthy}" != "" ]] && echo "device_smart_healthy{...} ${smart_healthy}"

But since "0" != "" is always true, the guard never works — the metric is always emitted, and if smartctl fails to return the health line for any transient reason, the metric reports 0 (unhealthy) instead of being omitted.

Verified in bash:

$ f() { local -i x=; echo "[$x]"; [[ "$x" != "" ]] && echo "not empty"; }; f
[0]
not empty

2. NVMe devices lack SMART support is: output lines

For SATA devices, smartctl -i -H outputs:

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

For NVMe devices, these lines do not exist — NVMe has a different output format. As a result, smart_available and smart_enabled remain at their default value of 0.

However, SMART / Health Information (Log Identifier 02h) is mandatory for all NVMe I/O controllers per NVM Express Base Specification (revision 2.0a, Section 3.1.2.1.2, Figure 24):

Log Page Name Support Requirements
SMART / Health Information (Controller scope) M (Mandatory)

Environment

  • Proxmox VE 8.2.4
  • smartctl 7.3 (2022-02-28)
  • Samsung SSD 980 1TB NVMe (smartctl -H reports PASSED, 0 errors, 100% spare)
  • smartmon.sh from prometheus-node-exporter-collectors Debian package (same code as upstream master)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions