release range lock after freed in dmu.#18647
Conversation
|
Hi, @ryao , did you notice I had uploaded a test program and trace log. The trace log clearly showed an ABBA deadlock. And as I stated in the PR, I had tested in a lot environment settings with the hang, and the fix. Also I have doubts if we should re-acquire range lock before zfs_zero_partial_page after we release range lock before truncate_inode_pages_range. ------------- below is original comment ------------- The test program and its dmesg. |
Signed-off-by: tiehexue <tiehexue@hotmail.com>
|
This PR, release range block before zero partial page, should bring concern about data corruption or correctness. I was trying to verify this by some small programs, however, those programs did not make distinct difference. A such program or test case that can prove releasing early is bad are more than welcome! |
5332a4e to
5195704
Compare
OK, I guess this is the whole picture: 1) for data path go to ARC, it is fine; 2) only mmap bypass ARC. For mmap read, it will read dirty data, which is fine, and same as other filesystem, which should be handled in application level. For mmap write, this PR is a must, because there is chance of dead-lock. And if no dead-lock, there is very very little chance after our thread releasing range lock and before starting page truncating, a mmap write may be done in another thread. I have to add msleep(100) between to see a dirty page (by another thread) is truncated. And this should be handled in application level too, at least a file (range) lock is enough. We could see this not a bug, but a reasonable concurrent visit result. Any way, kernel module shall not be in dead-lock while administrator has to come to the machine to do a reset. |
There was a problem hiding this comment.
Could you provide a better description of the lock inversion that causes the deadlock? I would like to see stacks for the two threads showing where lock X is grabbed by thread A, where lock Y would be grabbed by thread A, where lock Y is grabbed by thread B and where lock X would be grabbed by thread B.
I know you are saying this is between PG_writeback and zfs_rangelock_enter()/zfs_rangelock_enter(), but I am not seeing an inverted relationship in a second thread when I look at the code. Seeing stack traces of the events where each thread takes locks in an inverse order would make this much easier to review. You don't need to capture the full true stack traces with a tool. That usually only works for the second lock in each thread. Making a stack trace for the first lock in each thread by hand would work for enabling others to see the problem.
Motivation and Context
While working on issue #18454 , a test c program is written to reproduce the issue, however, failed. But a dead lock problem is found. And this is the fix. This deadlock is very similar to the one reported in #18454 , relates to PG_writeback, and triggered by unlink/fsync/ftruncate, and my testing env is aligned with it. It will be good if the issuer can verify this PR.
Description
After free a range in dmu, we should immediately release the range lock. In the original code, the comments stress that following:
"Zero partial page cache entries. This must be done under a range lock in order to keep the ARC and page cache in sync."
The test progam indeed shows deadlock here. The program and deadlock trace will be appended in this PR. The root cause is that we have double pass in fsync/msync, the second pass try to acquire range lock while still holds PG_writeback, then it is blocked by range lock in another thread. The thread has range lock is block by PG_writeback in folio_wait_writeback.
We may need more tests here to ensure ARC and page cache has no problem.
How Has This Been Tested?
run the test program, and watch dmesg.
ubuntu 26.04 amd64/arm64 ext4: fine
debian 13.4 amd64/arm64 ext4: fine
amd64:
ubuntu 26.04 7.0.0-22-generic + 2.3.7 + writeback_iter: deadlock
--- this configuration is to verify if writeback_iter is the root cause, refer to PR #18625
ubuntu 26.04 7.0.0-22-generic + 2.3.7: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1 + this PR: fixed
6.12.74+deb13-amd64 + zfs-2.3.2-1 : deadlock
6.12.74+deb13-amd64 + zfs-2.3.2-1 + this PR: fixed
arm64:
6.12.74+deb13-arm64 + zfs-2.3.2-1: hang
6.12.74+deb13-arm64 + zfs-2.3.2-1 + this PR: fixed
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1 + this PR: fixed
Types of changes
Checklist:
Signed-off-by.