Skip to content

vdev queue stats#16200

Open
robn wants to merge 1 commit intoopenzfs:masterfrom
robn:vdev-queue-stats
Open

vdev queue stats#16200
robn wants to merge 1 commit intoopenzfs:masterfrom
robn:vdev-queue-stats

Conversation

@robn
Copy link
Member

@robn robn commented May 16, 2024

[Sponsors: Klara, Inc., Syneto]

Motivation and Context

Part of my ongoing quest to understand what's happening inside the box (previously).

This time, its counters showing what vdev_queue is up to.

Description

Adds a bunch of wmsum_t counters to ever vdev_queue instance for a real device. This show current count of IOs queued and in-flight (total and broken down by class), total IOs in/out over the lifetime of the queue, and basic aggregation counters.

The counters are exposed under /proc/spl/kstat/zfs/<pool>/vdev/<guid>/queue on Linux, or kstat.zfs.<pool>.vdev.<guid>.misc.queue on FreeBSD.

# zpool status -g
  pool: tank
 state: ONLINE
config:

	NAME                      STATE     READ WRITE CKSUM
	tank                      ONLINE       0     0     0
	 11293794978541385724    ONLINE       0     0     0
	   13809318117615536196  ONLINE       0     0     0
	   1868205675291292825   ONLINE       0     0     0
	   815484099661475330    ONLINE       0     0     0
	   14246512426141088651  ONLINE       0     0     0

errors: No known data errors

# ls -l /proc/spl/kstat/zfs/tank/vdev/*/queue
-rw-r--r-- 1 root root 0 May 16 06:27 /proc/spl/kstat/zfs/tank/vdev/13809318117615536196/queue
-rw-r--r-- 1 root root 0 May 16 06:27 /proc/spl/kstat/zfs/tank/vdev/14246512426141088651/queue
-rw-r--r-- 1 root root 0 May 16 06:27 /proc/spl/kstat/zfs/tank/vdev/1868205675291292825/queue
-rw-r--r-- 1 root root 0 May 16 06:27 /proc/spl/kstat/zfs/tank/vdev/815484099661475330/queue

# cat /proc/spl/kstat/zfs/tank/vdev/13809318117615536196/queue
20 1 0x01 45 12240 3024876135 13088804505
name                            type data
io_queued                       4    0
io_syncread_queued              4    0
io_syncwrite_queued             4    0
io_asyncread_queued             4    0
io_asyncwrite_queued            4    0
io_scrub_queued                 4    0
io_removal_queued               4    0
io_initializing_queued          4    0
io_trim_queued                  4    0
io_rebuild_queued               4    0
io_active                       4    0
io_syncread_active              4    0
io_syncwrite_active             4    0
io_asyncread_active             4    0
io_asyncwrite_active            4    0
io_scrub_active                 4    0
io_removal_active               4    0
io_initializing_active          4    0
io_trim_active                  4    0
io_rebuild_active               4    0
io_enqueued_total               4    236036
io_syncread_enqueued_total      4    11
io_syncwrite_enqueued_total     4    13054
io_asyncread_enqueued_total     4    0
io_asyncwrite_enqueued_total    4    222971
io_scrub_enqueued_total         4    0
io_removal_enqueued_total       4    0
io_initializing_enqueued_total  4    0
io_trim_enqueued_total          4    0
io_rebuild_enqueued_total       4    0
io_dequeued_total               4    236036
io_syncread_dequeued_total      4    11
io_syncwrite_dequeued_total     4    13054
io_asyncread_dequeued_total     4    0
io_asyncwrite_dequeued_total    4    222971
io_scrub_dequeued_total         4    0
io_removal_dequeued_total       4    0
io_initializing_dequeued_total  4    0
io_trim_dequeued_total          4    0
io_rebuild_dequeued_total       4    0
io_aggregated_total             4    37902
io_aggregated_data_total        4    107667
io_aggregated_read_gap_total    4    0
io_aggregated_write_gap_total   4    0
io_aggregated_shrunk_total      4    0

FreeBSD:

$ sysctl kstat.zfs.tank.vdev.3686087381038636139.misc.queue
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_aggregated_shrunk_total: 41
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_aggregated_write_gap_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_aggregated_read_gap_total: 10
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_aggregated_data_total: 109
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_aggregated_total: 20
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_rebuild_dequeued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_trim_dequeued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_initializing_dequeued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_removal_dequeued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_scrub_dequeued_total: 69
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncwrite_dequeued_total: 192
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncread_dequeued_total: 1
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncwrite_dequeued_total: 23
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncread_dequeued_total: 42
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_dequeued_total: 327
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_rebuild_enqueued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_trim_enqueued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_initializing_enqueued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_removal_enqueued_total: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_scrub_enqueued_total: 69
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncwrite_enqueued_total: 192
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncread_enqueued_total: 1
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncwrite_enqueued_total: 23
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncread_enqueued_total: 42
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_enqueued_total: 327
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_rebuild_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_trim_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_initializing_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_removal_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_scrub_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncwrite_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncread_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncwrite_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncread_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_active: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_rebuild_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_trim_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_initializing_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_removal_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_scrub_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncwrite_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_asyncread_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncwrite_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_syncread_queued: 0
kstat.zfs.tank.vdev.3686087381038636139.misc.queue.io_queued: 0

Notes

The actual stats part is pretty unremarkable, being little more than the normal "sums & stats" boilerplate. They perhaps don't technically need to be wmsum_t, since all the changes are made under vq_lock anyway, but its following a common pattern and part of why I want this is to assist with removing or greatly reducing the scope of vq_lock, so this is where they'll need to be anyway.

Update 2025-03-21: there was a bunch of stuff here about changes to kstats to allow "multi-level" kstats. That became its own PR, #17142, and was recently merged. So I've removed that discussion.

How Has This Been Tested?

Mostly through repeated pool create -> IO -> scrub -> export -> import -> IO -> export -> unload cycles, on both Linux and FreeBSD. Once the numbers looked good and things stopped complaining about replacement names and/or panicking, I declared it good.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn robn force-pushed the vdev-queue-stats branch from 9bcb026 to f9ae1e2 Compare May 16, 2024 11:13
@tonyhutter
Copy link
Contributor

zpool iostat -q will show you instantaneous queue levels. Would it make sense to add a zpool iostat -q --totals to display the totals (rather than separate kstat)?

@robn robn force-pushed the vdev-queue-stats branch from f9ae1e2 to 0a9a614 Compare May 21, 2024 23:45
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 29, 2024
@robn robn force-pushed the vdev-queue-stats branch from 0a9a614 to 1db22ec Compare January 7, 2025 11:13
@robn
Copy link
Member Author

robn commented Jan 7, 2025

At the time I had reservations about running it through iostat, in that kstats can have less of an impact on performance, which I was worried about, but also, I didn't want to think much about the ABI changes required. These days I'm more interested in having uniform interfaces for all platforms, though that might be a uniform kstat-like interface. I will play :)

@tonyhutter
Copy link
Contributor

@robn If you decide to go the kstat route, try reading the new kstats in a tight loop, while exporting the pool. It's a good smoke test for panics.

@robn robn mentioned this pull request Mar 13, 2025
13 tasks
@robn robn force-pushed the vdev-queue-stats branch from 527199a to 26f2a93 Compare March 21, 2025 00:26
@robn
Copy link
Member Author

robn commented Mar 21, 2025

If you decide to go the kstat route, try reading the new kstats in a tight loop, while exporting the pool. It's a good smoke test for panics.

@tonyhutter I'm not sure if this was general advice, or something you'd specifically seen here. I have tried it a bunch now and couldn't make it blow up. If it is something you saw here, I wouldn't be surprised if it was caused by me not zeroing kstat module state properly; that was caught and fixed in review on #17142. But if you can make it blow up, let me know how!

@robn
Copy link
Member Author

robn commented Mar 21, 2025

I've decided that for now at least, kstats are the way to go. It's a relatively uniform API and not much extra code; it shows some amount of implementation detail that we might not want to expose to userspace, and, frankly, I need this sooner rather than later.

I do want to sort out some sort of more uniform kstat API and interface; make them more of a first-class, cross-platform thing. I have prototypes but nothing much to show yet. I think we'll need to work out what that means up against something like iostat; maybe they are the same thing, maybe not. But I really don't want to get into that for this one, unless you insist :)

@tonyhutter
Copy link
Contributor

@tonyhutter I'm not sure if this was general advice, or something you'd specifically seen here. I have tried it a bunch now and couldn't make it blow up.

I've seen it blow up in the past when the kstats arn't taking the proper locks (like spa_namespace). Just looking at what you have here, it's probably ok, since you're properly adding/removing the kstat during queue add/removal.

I've decided that for now at least, kstats are the way to go. It's a relatively uniform API and not much extra code; it shows some amount of implementation detail that we might not want to expose to userspace, and, frankly, I need this sooner rather than later.

We're one the hook to support whatever queue stats interface we come up with for years to come. I see a more benefits with adding them to zpool iostat vs a kstat:

  1. We already have instantaneous queue stats with zpool iostat -q. You could add additional instantaneous stats there, and add a new -Q flag for the totals.
  2. zpool iostat can natively sample over a specific time period. So if you wanted to get queue totals over the last hour: zpool status -Q -y 3600 1.
  3. zpool iostat has native support for totaling up stats per-pool.
  4. It's possible someone adds in JSON support to zpool iostat in the future, which could be helpful here.
  5. zpool iostat can natively format stats to nicenum values, like "1.9M" and do ANSI colored highlighting. This will be helpful when the totals get really huge over time.
  6. zpool iostat documents the various queues in the man pages.
  7. zpool iostat will let you see queue stats in the same row as bandwidth/latency status, which could be helpful for finding outliers.
  8. With zpool iostat you have a single, familiar, interface for getting queue stats. With this PR, it becomes: "some queue stats are in zpool iostat others are in a kstat".

Thoughts?

Adding a bunch of gauges and counters to show in-flight and total IOs,
with per-class breakdowns, and some aggregation counters.

Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
@robn robn force-pushed the vdev-queue-stats branch from 26f2a93 to cd0590a Compare January 5, 2026 00:11
@robn
Copy link
Member Author

robn commented Jan 5, 2026

Rebased to master, no changes.

(Incidentally, I have come around to the view that these should be exposed through iostat, not kstat, but mostly for other reasons. Still, this PR is useful to show where the touch points are when I/someone gets back to it).

@behlendorf behlendorf self-requested a review January 5, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants