-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Proxmox |
| Distribution Version | 9.1.5 |
| Kernel Version | Linux 6.17.9-1-pve |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.4.0-pve1 |
Summary
Since upgrading from OpenZFS 2.3.x to 2.4.0, metadata/status queries such as:
zpool status
zfs get -Hp available,used <pool>can cause an otherwise completely idle HDD-backed pool to spin up.
This did not occur on OpenZFS 2.3.x under the same workload and configuration.
Previously, the HDD pool would only spin up when explicit data reads or writes were directed to it.
Environment Description
- Pool name:
hdd-pool - Backed solely by spinning hard drives
- Used as cold storage
- Typically accessed once per month to copy data from SSD-backed pools
- Otherwise fully idle and allowed to spin down
Drives are spun down automatically using hd-idle.
On OpenZFS 2.3.x, they would remain spun down indefinitely while idle.
Additional Information
For approximately two years, I have been running a prometheus exporter https://github.com/pdf/zfs_exporter with the following collectors enabled:
--collector.dataset-filesystem
--properties.dataset-filesystem="available,logicalused,quota,referenced,used,usedbydataset,usedsnap,written"
--collector.dataset-volume --properties.dataset-volume="available,logicalused,referenced,used,usedbydataset,usedsnap,volsize,written"
--collector.pool
Under OpenZFS 2.3.x, the exporter scraped all pools (including hdd-pool) without triggering HDD spin-ups.
I also frequently ran:
zpool status
zfs get -Hp available,used hdd-poolThese commands did not previously cause spin-ups.
Behaviour Since OpenZFS 2.4
After upgrading to 2.4.0:
- The Prometheus ZFS exporter occasionally spins up the HDD pool.
pvestatdqueries (e.g.zfs get) occasionally spin up the pool.- Manual
zpool statusandzfs get -Hp available,usedsometimes spins up the pool.
Spin-up does not always happen immediately, but may occur after sufficient idle time. I've been seeing intervals between spin-ups of 6-12 hours on average, leading to about 2-3 spin ups per day.
This suggests metadata required to answer these queries is no longer being satisfied purely from in-memory state (SPA/ARC), and is requiring vdev I/O under some conditions.
My current workaround
I have added hdd-pool to the exclusion list of the Prometheus ZFS exporter, and disabled the Proxmox storage on the pool to avoid the pvestatd queries. This is inconvenient however, as it removes the pool from my grafana views.
I've also stopped doing zpool status and zfs get commands without specifying a pool explicitly, to avoid querying hdd-pool state. However due to muscle memory I often forget, causing an unnecessary spin up.
The only way to avoid this entirely seems to be to export the pool when not in use, so that it does not show up when doing zpool status and similar commands. However that means that my existing cronjobs to handle automatic monthly backups/scrubs do not work, and will require updating to import the pool at the start and export the pool after completion, which is inconvenient.
Related issue
This is distinct from #18082 but likely related.
I encountered that issue as well (HDDs not spinning down due to TXG time flushing) and mitigated it by increasing /sys/module/zfs/parameters/spa_note_txg_time to 31557600 i.e. 1 year.
After working around that issue, metadata/status queries still occasionally cause spin-ups. This report concerns that separate behavior change.
Steps to Reproduce
-
Create a ZFS pool backed only by HDDs.
-
Ensure no datasets are actively accessed.
-
Work around [2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down #18082 by doing e.g.:
echo 31557600 > /sys/module/zfs/parameters/spa_note_txg_time -
Spin down drives:
hdparm -y /dev/sdX
-
Wait until the pool is idle.
-
After some time has passed, run:
zpool status
or
zfs get -Hp available,used <pool>
Repeat this last step until your drives spin up. In my experience, it can take multiple hours.
Questions
- Was there an intentional change in 2.4 affecting this functionality?
- Is this related to internal TXG time/statistics changes discussed in [2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down #18082 ?
- Is there a tunable to restore previous behavior?
- Is this considered a regression?