Description
I have an SSD (full JSON output attached below) that has errors recorded in its log:
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 18 (device log contains only the most recent 4 errors)
...
Error 18 occurred at disk power-on lifetime: 1185 hours (49 days + 9 hours)
Error 17 occurred at disk power-on lifetime: 1185 hours (49 days + 9 hours)
Error 16 occurred at disk power-on lifetime: 1185 hours (49 days + 9 hours)
Error 15 occurred at disk power-on lifetime: 1185 hours (49 days + 9 hours)
Error 14 occurred at disk power-on lifetime: 1185 hours (49 days + 9 hours)
which causes smartctl_exporter
to log a warning message every time it polls the drive:
time=2025-03-23T19:04:38.630Z level=WARN source=readjson.go:71 msg="S.M.A.R.T. output reading" err="exit status 64" device="/dev/sda;auto (sda)"
time=2025-03-23T19:04:38.630Z level=WARN source=readjson.go:151 msg="The device error log contains records of errors" device="/dev/sda;auto (sda)"
however, my drive's Power_On_Hours
is currently over 31,000 - the errors that were recorded in the device log happened over 3 years ago. the drive has passed numerous scheduled self-tests since then, so whatever the error was seems to have been transient and not an indication of a drive that's about to die.
these warnings are harmless, but they're also unnecessary log noise that I'd like the option to suppress.
some options I can think of:
-
something along the lines of
--ignore-device-log-errors-older-than <duration>
or--ignore-device-log-errors-from <device serial number>
-
log this message only once per device, and then suppress the message for that device (at least until
smartctl_exporter
is restarted) -
log this message only when the
smartctl_device_error_log_count
metric increases for a given device (this matches the actual monitoring rule I have in place, alerting onincrease(smartctl_device_error_log_count)
)
sat-Marvell_based_SanDisk_SSDs-SanDisk_SD5SG2128G1052E-sda.json