-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix unnecessary writes of transaction database #18120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This doesn't seem right. If we have a dirty database and the time came to flush we want to flush database, not to wait for the next TXG to arrive no? |
|
I guess we just need to check if there is something to flush and not flush database if no modification to it was made. |
Yes, that's done by the newly added However with this change alone, I am not certain that updating If |
In your case, the database is flushed not only when the timeout expires, but only when a new TXG is issued and timeout expired. As a result, if the database is dirty and no new TXG is created by the time the flush deadline is reached, the function returns early instead of flushing the current TXG. |
excellent point. I think both conditions need to be merged into single |
I don't know how to fix that. From your comment above, the current behaviour is correct since it will flush the database on timeout whether or not there's new TXG. But that's exactly what's waking the disks in 10 minutes (default config) intervals. I guess this needs a new check "is the database dirty" to supplement the new TXG check, but I don't know how to implement it. Here's comparison of different behaviours, as truths tables: A = new TXG ? current behaviour:
proposed behaviour:
additional check: C = is database dirty ? improved fix:
In particular, this row |
|
Hym, well don't we need to add simply check like: Basically saying that if we have not noted any TXG since last flush lets not flush database? |
you mean like in a2781dd ? https://github.com/Bronek/zfs/blob/a2781ddc3f619c852ddbc5b8532f7a0c91e944c8/module/zfs/spa.c#L2167-L2170 |
|
Yep, I think this should work. Now we basically have a way to notice that the database is actually dirty or not. Beascilly if the time of last flush is lower then noted TXG it means we have something to write. If the time of last flush is higher it means that we already flushed everything so there is no need to update ZAP. |
|
Thinking about this in a bit more detail, I believe there may still be some corner cases. In particular, there is a scenario where we could perform one unnecessary extra write. Because To avoid that we can add 1s to flush time for example, just to make sure that they are not the same. |
a2781dd to
89ba5c7
Compare
How about this 89ba5c7 ? |
|
IMHO this is wrong :) Let me explain (I have rewritten the previous example): |
Please note this commit also changes the comparison (for the early |
89ba5c7 to
b38932e
Compare
I see, I have missed that. |
|
After looking at it once more, I don’t think we need this special variable |
Could this function be called more than once before one second has elapsed ? |
|
Can be, but I don’t see a problem even if spa_flush_txg_time is set to 0 (which probably doesn’t make sense to set); we will still flush every second. |
With the early return condition like this ... and assuming neither of EDIT: I see how this would work. We bump |
|
I think you can squas this commits, fix checkstyles and they are ready to go. |
798e23d to
a5fe3d8
Compare
|
@oshogbo technically all of the fix comes from your comments. Ok to add this line ? |
|
Sure, I appreciate it. Thanks! |
This makes the periodical flushes of transaction timestamps database conditional to any updates in the database since the last flush. Co-authored-by: Mariusz Zaborski <[email protected]> Signed-off-by: Bronek Kozicki <[email protected]>
a5fe3d8 to
6704e19
Compare
amotin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry if I missed something from the earlier discussion, TLDR, but I am not getting how this change should fix anything. spa_sync_time_logger() is normally called only once per TXG, so the condition txg > spa->spa_last_noted_txg should always be true. After that, if curtime >= spa->spa_last_noted_txg_time + spa_note_txg_time so that we could get to the later condition, it means we update spa_last_noted_txg_time. Which means the added spa->spa_last_noted_txg_time <= spa->spa_last_flush_txg_time will be false.
I haven't looked deeper and may be wrong, but I suspect that the actual problem here might be that ZFS kicks TXGs and calls this function each TXG timeout, even if nothing has changed there. Those empty TXGs in turn are getting logged into the DB, and respectively produce pool writes that would otherwise not existed.
This makes the periodical flushes of TXG timestamps database conditional to any updates in the database since the last flush.
No change to documentation, since arguably this is also the most intuitive behaviour.
Motivation and Context
This fixes #18082
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by.