Skip to content

Commit 0768a40

Browse files
authored
Merge pull request #180 from NiceGuyIT/docs
Fix: Issue 157: Document how to troubleshoot data inconsistencies
2 parents bcd3a7d + 089e8f2 commit 0768a40

File tree

1 file changed

+136
-3
lines changed

1 file changed

+136
-3
lines changed

Diff for: README.md

+136-3
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,18 @@ Example output you can show in [EXAMPLE.md](EXAMPLE.md)
88

99
## Need more?
1010
**If you need additional metrics - contact me :)**
11-
**Create a feature request, describe the metric that you would like to have and attach exported from smartctl json file**
11+
12+
**Create a feature request, describe the metric that you would like to have and
13+
attach exported from smartctl json file**
1214

1315
# Requirements
14-
smartmontools >= 7.0, because export to json [released in 7.0](https://www.smartmontools.org/browser/tags/RELEASE_7_0/smartmontools/NEWS#L11)
16+
`smartmontools` >= 7.0, because export to json [released in 7.0](https://www.smartmontools.org/browser/tags/RELEASE_7_0/smartmontools/NEWS#L11)
1517

1618
# Configuration
1719
## Command line options
1820

19-
The exporter will scan the system for available devices if no `--smartctl.device` flags are used.
21+
The exporter will scan the system for available devices if no `--smartctl.device`
22+
flags are used.
2023

2124
```
2225
usage: smartctl_exporter [<flags>]
@@ -72,3 +75,133 @@ services:
7275
ports:
7376
- "9633:9633"
7477
```
78+
79+
# Troubleshooting
80+
## Troubleshooting data inconsistencies
81+
`smartmon_exporter` uses the JSON output from `smartctl` to provide the data to
82+
Prometheus. If the data is incorrect, look at the data from `smartctl` to
83+
determine if the issue should be reported upstream to smartmontools or to this
84+
repo. In general, the `smartctl_exporter` should not modify the data in flight.
85+
If the data is missing from `smartctl`, it should not be in `smartctl_exporter`.
86+
If the data from `smartctl` is incorrect, it should be reported upstream.
87+
Requests for `smartctl_exporter` to "fix" incorrect data where `smartctl` is
88+
reporting incorrect data will be closed. The grey area is when invalid or
89+
missing data from smartctl is causing multiple invalid or incorrect data
90+
in `smartctl_exporter`. This could happen if the data is used in a calculation
91+
for other data. This will need to be researched on a case by case basis.
92+
93+
| - | smartctl valid | smartctl missing | smartctl invalid/incorrect |
94+
|---------------------------|-----------------------------|-------------------------------------------------|----------------------------------|
95+
| smartctl_exporter valid | all good | N/A | N/A |
96+
| smartctl_exporter missing | issue for smartctl_exporter | report upstream to smartmontools | report upstream to smartmontools |
97+
| smartctl_exporter invalid | issue for smartctl_exporter | issue for smartctl_exporter and report upstream | report upstream to smartmontools |
98+
99+
### smartctl output vs smartctl_exporter output
100+
101+
The S.M.A.R.T. attributes are mapped in
102+
[smartctl.go](https://github.com/prometheus-community/smartctl_exporter/blob/master/smartctl.go).
103+
Each function has a `prometheus.MustNewConstMetric` or similar function with the
104+
first parameter being the metric name. Find the metric name in
105+
[metrics.go](https://github.com/prometheus-community/smartctl_exporter/blob/master/metrics.go)
106+
to see how the exporter displays the information. This may sound technical, but
107+
it's crucial for understanding how data flows from `smartctl` to
108+
`smartctl_exporter` to Prometheus.
109+
110+
If the data looks incorrect, check the
111+
[Smartmontools Frequently Asked Questions (FAQ)](https://www.smartmontools.org/wiki/FAQ).
112+
It's likely your question may already have an answer. If you still have
113+
questions, open an [issue]().
114+
115+
## Gathering smartctl data
116+
Follow these steps to gather smartctl data for troubleshooting purposes. If you
117+
have unique drives/data/edge cases and would like to "donate" the data, open a
118+
PR with the redacted JSON files.
119+
120+
1. Run `scripts/collect-smartctl-json.sh` to export all drives to a
121+
`smartctl-data` directory (created in the current directory).
122+
2. Run `scripts/redact_fake_json.py` to redact sensitive data.
123+
3. Provide the JSON file for the drive in question.
124+
125+
```bash
126+
cd scripts
127+
./collect-smartctl-json.sh
128+
./redact-fake-json.py smartctl-data/*.json
129+
```
130+
131+
## Run smartctl_exporter using JSON data
132+
The `smartctl_exporter` can be run using local JSON data. The device names are
133+
pulled from actual devices in the machine while the data is redirected to the
134+
`debug` directory. Save the JSON data in the `debug` directory using the actual
135+
device names using a 1:1 ratio. If you have 3 devices, `sda`, `sdb` and `sdc`,
136+
the `smartctl_exporter` will expect 3 files: `debug/sda.json`, `debug/sdb.json`
137+
and `debug/sdc.json`.
138+
139+
Once the "fake devices" (JSON files) are in place, run the exporter passing the
140+
hidden `--smartctl.fake-data` switch on the command line. The port is specified
141+
to prevent conflicts with an existing `smartctl_exporter` on the default port.
142+
143+
```bash
144+
smartctl_exporter --web.listen-address 127.0.0.1:19633 --smartctl.fake-data
145+
```
146+
147+
# FAQ
148+
## How do I run `smartctl_exporter` against a JSON file?
149+
150+
If you're helping someone else, request the output of the `smartctl` command
151+
above. Feed this into the `smartctl_exporter` using the
152+
hidden `--smartctl.fake-data` flag. If a `smartctl_exporter` is already running,
153+
use a different port; in this case, it's `19633`. Run `collect_fake_json.sh`
154+
first to collect the JSON files for **your** devices. Copy the requested JSON
155+
file into one of the fake files. After starting the exporter, you can query it
156+
to see the data generated.
157+
158+
```bash
159+
# Dump the JSON files for your devices into debug/
160+
./collect_fake_json.sh
161+
162+
# copy the test JSON into one of the files in debug/
163+
cp extracted-from-above-sda.json debug/sda.json
164+
165+
# Make sure you have the latest version
166+
go build
167+
# Use a different port in case smartctl_exporter is already running
168+
sudo ./smartctl_exporter --web.listen-address=127.0.0.1:19633 --log.level=debug --smartctl.fake-data
169+
170+
# Use curl with grep
171+
curl --silent 127.0.0.1:19633/metrics | grep -i nvme
172+
# Or xh with ripgrep
173+
xh --body :19633/metrics | rg nvme
174+
```
175+
176+
## Why is root required? Can't I add a user to the "disk" group?
177+
178+
A blogger had the same question and opened a ticket on smartmontools. This is
179+
their response. `smartctl` needs to be run as root.
180+
181+
[RFE: add O_RDRW mode for sat/scsi/ata devices](https://www.smartmontools.org/ticket/1064)
182+
183+
> According to function `blk_verify_command()` from current kernel sources
184+
> (see [​block/scsi_ioctl.c](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/scsi_ioctl.c)),
185+
> O_RDONLY or O_RDWR make no difference if device was opened as root (or with
186+
> CAP_SYS_RAWIO).
187+
>
188+
> The SCSI commands listed in function `blk_set_cmd_filter_defaults()` show
189+
> that some of the `smartctl -d scsi` functionality might work with O_RDONLY
190+
> for non-root users. Some more might work with O_RDWR.
191+
>
192+
> But `smartctl -d sat` (to access SATA devices) won't work at all because the
193+
> SCSI commands ATA_12 and ATA_16
194+
> (see [​scsi_proto.h](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/scsi/scsi_proto.h))
195+
> are **always blocked for non-root users**.
196+
197+
## What about my NVMe drive?
198+
From the smartmontools FAQ: [My NVMe drive is not in the smartctl/smartd database](https://www.smartmontools.org/wiki/FAQ#MyNVMedriveisnotinthesmartctlsmartddatabase)
199+
> SCSI/SAS and NVMe drives do not provide ATA/SATA-like SMART Attributes.
200+
> Therefore the drive database does not contain any entries for these drives.
201+
> This may change in the future as some drives provide similar info via vendor
202+
> specific commands (see ticket #870).
203+
204+
smartmontools also has a [wiki page for NVMe](https://www.smartmontools.org/wiki/NVMe_Support) devices.
205+
206+
## How do I report upstream to smartmontools?
207+
Check their FAQ: [How to create a bug report](https://www.smartmontools.org/wiki/FAQ#Howtocreateabugreport).

0 commit comments

Comments
 (0)