Skip to content

Flush RRD only when TXGs contain data#18138

Merged
behlendorf merged 1 commit intoopenzfs:masterfrom
oshogbo:oshogbo/flush_bad
Feb 11, 2026
Merged

Flush RRD only when TXGs contain data#18138
behlendorf merged 1 commit intoopenzfs:masterfrom
oshogbo:oshogbo/flush_bad

Conversation

@oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Jan 16, 2026

Description

This change modifies the behavior of spa_sync_time_logger when flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always be generated. On solid-state devices, especially when the pool was otherwise idle, this caused disks to wake up solely to write RRD data. Since RRD is best-effort telemetry, this behavior is unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG that already contains data is being synced. The RRD update is appended to that TXG instead of forcing the creation of a new write-only TXG.

During pool export, flushing is forced regardless of whether the TXG contains user data. At that stage, data durability takes precedence and a write must be issued.

This fixes #18082
This change was inspired from @amotin in comments #18120.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]

How Has This Been Tested?

I have added logs to check when the database is flushed and what is the size of database.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure dp_dirty_pertxg at this point reliably means there is nothing to be written in this TXG. It may need a deeper look. But yea, this might be the direction.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have other objections, but I still worry about already mentioned dp_dirty_pertxg. For example, will snapshot creation or something similar, working in sync context, trigger the history update?

This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
@behlendorf behlendorf self-requested a review February 5, 2026 02:05
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dp_dirty_pertxg is set by dsl_pool_dirty_space() so this should be a reasonable way to quickly check for any dirty data associated with the txg. I believe you're right, it won't account for anything dirtied in syncing context but since this best-effort I don't think that need to hold this up.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Feb 6, 2026
@behlendorf behlendorf requested a review from amotin February 6, 2026 18:13
@amotin
Copy link
Member

amotin commented Feb 6, 2026

I think a better way to do it would be to call spa_sync_time_logger() in spa_sync_iterate_to_convergence() between syncing datasets and syncing MOS (now both are inside dsl_pool_sync()). I.e. we need to update the database each time we need to sync dirty MOS after doing everything else. I actually envision slightly bigger refactoring there, since I think it would be beneficial to sync BRT and DDT (and may be doing something else) before syncing MOS to reduce the number of sync iterations. But I need to look on it on a fresh head.

@behlendorf
Copy link
Contributor

Agreed, that would be better. @oshogbo can you look at reworking this.

@behlendorf behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Accepted Ready to integrate (reviewed, tested) labels Feb 9, 2026
@behlendorf
Copy link
Contributor

Actually, let me go ahead and merged this fix as is. It's been tested and resolves the core issue for now. We can refactor is as suggested by @amotin in a future PR to further improve things.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Revision Needed Changes are required for the PR to be accepted labels Feb 11, 2026
@behlendorf behlendorf merged commit cdf89f4 into openzfs:master Feb 11, 2026
38 of 41 checks passed
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Feb 11, 2026
This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes openzfs#18082
Closes openzfs#18138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[2.4] TXG timestamp DB sync if idle causes unnecessary disk access/prevent spin down

3 participants