Skip to content

[2.4] Behaviour change: Using zpool status or zfs get on a pool with spun down HDDs causes them to spin up #18247

@ruipin

Description

@ruipin

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 9.1.5
Kernel Version Linux 6.17.9-1-pve
Architecture x86_64
OpenZFS Version zfs-2.4.0-pve1

Summary

Since upgrading from OpenZFS 2.3.x to 2.4.0, metadata/status queries such as:

zpool status
zfs get -Hp available,used <pool>

can cause an otherwise completely idle HDD-backed pool to spin up.

This did not occur on OpenZFS 2.3.x under the same workload and configuration.

Previously, the HDD pool would only spin up when explicit data reads or writes were directed to it.

Environment Description

  • Pool name: hdd-pool
  • Backed solely by spinning hard drives
  • Used as cold storage
  • Typically accessed once per month to copy data from SSD-backed pools
  • Otherwise fully idle and allowed to spin down

Drives are spun down automatically using hd-idle.

On OpenZFS 2.3.x, they would remain spun down indefinitely while idle.

Additional Information

For approximately two years, I have been running a prometheus exporter https://github.com/pdf/zfs_exporter with the following collectors enabled:

--collector.dataset-filesystem
--properties.dataset-filesystem="available,logicalused,quota,referenced,used,usedbydataset,usedsnap,written"
--collector.dataset-volume --properties.dataset-volume="available,logicalused,referenced,used,usedbydataset,usedsnap,volsize,written"
--collector.pool

Under OpenZFS 2.3.x, the exporter scraped all pools (including hdd-pool) without triggering HDD spin-ups.

I also frequently ran:

zpool status
zfs get -Hp available,used hdd-pool

These commands did not previously cause spin-ups.

Behaviour Since OpenZFS 2.4

After upgrading to 2.4.0:

  • The Prometheus ZFS exporter occasionally spins up the HDD pool.
  • pvestatd queries (e.g. zfs get) occasionally spin up the pool.
  • Manual zpool status and zfs get -Hp available,used sometimes spins up the pool.

Spin-up does not always happen immediately, but may occur after sufficient idle time. I've been seeing intervals between spin-ups of 6-12 hours on average, leading to about 2-3 spin ups per day.

This suggests metadata required to answer these queries is no longer being satisfied purely from in-memory state (SPA/ARC), and is requiring vdev I/O under some conditions.

My current workaround

I have added hdd-pool to the exclusion list of the Prometheus ZFS exporter, and disabled the Proxmox storage on the pool to avoid the pvestatd queries. This is inconvenient however, as it removes the pool from my grafana views.

I've also stopped doing zpool status and zfs get commands without specifying a pool explicitly, to avoid querying hdd-pool state. However due to muscle memory I often forget, causing an unnecessary spin up.

The only way to avoid this entirely seems to be to export the pool when not in use, so that it does not show up when doing zpool status and similar commands. However that means that my existing cronjobs to handle automatic monthly backups/scrubs do not work, and will require updating to import the pool at the start and export the pool after completion, which is inconvenient.

Related issue

This is distinct from #18082 but likely related.

I encountered that issue as well (HDDs not spinning down due to TXG time flushing) and mitigated it by increasing /sys/module/zfs/parameters/spa_note_txg_time to 31557600 i.e. 1 year.

After working around that issue, metadata/status queries still occasionally cause spin-ups. This report concerns that separate behavior change.

Steps to Reproduce

  1. Create a ZFS pool backed only by HDDs.

  2. Ensure no datasets are actively accessed.

  3. Work around [2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down #18082 by doing e.g.:

    echo 31557600 > /sys/module/zfs/parameters/spa_note_txg_time
    
  4. Spin down drives:

    hdparm -y /dev/sdX
  5. Wait until the pool is idle.

  6. After some time has passed, run:

    zpool status

    or

    zfs get -Hp available,used <pool>

    Repeat this last step until your drives spin up. In my experience, it can take multiple hours.

Questions

  1. Was there an intentional change in 2.4 affecting this functionality?
  2. Is this related to internal TXG time/statistics changes discussed in [2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down #18082 ?
  3. Is there a tunable to restore previous behavior?
  4. Is this considered a regression?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions