release range lock after freed in dmu. by tiehexue · Pull Request #18647 · openzfs/zfs

tiehexue · 2026-06-08T08:50:04Z

Motivation and Context

While working on issue #18454 , a test c program is written to reproduce the issue, however, failed. But a dead lock problem is found. And this is the fix. This deadlock is very similar to the one reported in #18454 , relates to PG_writeback, and triggered by unlink/fsync/ftruncate, and my testing env is aligned with it. It will be good if the issuer can verify this PR.

Description

After free a range in dmu, we should immediately release the range lock. In the original code, the comments stress that following:
"Zero partial page cache entries. This must be done under a range lock in order to keep the ARC and page cache in sync."

The test progam indeed shows deadlock here. The program and deadlock trace will be appended in this PR. The root cause is that we have double pass in fsync/msync, the second pass try to acquire range lock while still holds PG_writeback, then it is blocked by range lock in another thread. The thread has range lock is block by PG_writeback in folio_wait_writeback.

We may need more tests here to ensure ARC and page cache has no problem.

How Has This Been Tested?

run the test program, and watch dmesg.

ubuntu 26.04 amd64/arm64 ext4: fine
debian 13.4 amd64/arm64 ext4: fine

amd64:

ubuntu 26.04 7.0.0-22-generic + 2.3.7 + writeback_iter: deadlock
--- this configuration is to verify if writeback_iter is the root cause, refer to PR #18625

ubuntu 26.04 7.0.0-22-generic + 2.3.7: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1 + this PR: fixed

6.12.74+deb13-amd64 + zfs-2.3.2-1 : deadlock
6.12.74+deb13-amd64 + zfs-2.3.2-1 + this PR: fixed

arm64:

6.12.74+deb13-arm64 + zfs-2.3.2-1: hang
6.12.74+deb13-arm64 + zfs-2.3.2-1 + this PR: fixed
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1: deadlock
ubuntu 26.04 7.0.0-22-generic + 2.4.99-1 + this PR: fixed

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

tiehexue · 2026-06-08T08:50:56Z

Hi, @ryao , did you notice I had uploaded a test program and trace log. The trace log clearly showed an ABBA deadlock. And as I stated in the PR, I had tested in a lot environment settings with the hang, and the fix.

Also I have doubts if we should re-acquire range lock before zfs_zero_partial_page after we release range lock before truncate_inode_pages_range.

------------- below is original comment -------------

The test program and its dmesg.

dead_lock.c
trace.log

Signed-off-by: tiehexue <tiehexue@hotmail.com>

tiehexue · 2026-06-08T16:08:58Z

This PR, release range block before zero partial page, should bring concern about data corruption or correctness. I was trying to verify this by some small programs, however, those programs did not make distinct difference.

A such program or test case that can prove releasing early is bad are more than welcome!

tiehexue · 2026-06-09T06:14:12Z

This PR, release range block before zero partial page, should bring concern about data corruption or correctness. I was trying to verify this by some small programs, however, those programs did not make distinct difference.

A such program or test case that can prove releasing early is bad are more than welcome!

OK, I guess this is the whole picture: 1) for data path go to ARC, it is fine; 2) only mmap bypass ARC.

For mmap read, it will read dirty data, which is fine, and same as other filesystem, which should be handled in application level.

For mmap write, this PR is a must, because there is chance of dead-lock. And if no dead-lock, there is very very little chance after our thread releasing range lock and before starting page truncating, a mmap write may be done in another thread. I have to add msleep(100) between to see a dirty page (by another thread) is truncated. And this should be handled in application level too, at least a file (range) lock is enough. We could see this not a bug, but a reasonable concurrent visit result.

Any way, kernel module shall not be in dead-lock while administrator has to come to the machine to do a reset.

ryao

Could you provide a better description of the lock inversion that causes the deadlock? I would like to see stacks for the two threads showing where lock X is grabbed by thread A, where lock Y would be grabbed by thread A, where lock Y is grabbed by thread B and where lock X would be grabbed by thread B.

I know you are saying this is between PG_writeback and zfs_rangelock_enter()/zfs_rangelock_enter(), but I am not seeing an inverted relationship in a second thread when I look at the code. Seeing stack traces of the events where each thread takes locks in an inverse order would make this much easier to review. You don't need to capture the full true stack traces with a tool. That usually only works for the second lock in each thread. Making a stack trace for the first lock in each thread by hand would work for enabling others to see the problem.

This was referenced Jun 8, 2026

PG_writeback may stuck with writeback_iter #18625

Closed

ZFS issues with new folio API on Debian 13 #18454

Open

release range lock after freed in dmu.

5195704

Signed-off-by: tiehexue <tiehexue@hotmail.com>

tiehexue force-pushed the dead-lock-in-dmu-free branch from 5332a4e to 5195704 Compare June 8, 2026 16:13

behlendorf added the Status: Code Review Needed Ready for review and testing label Jun 8, 2026

tiehexue mentioned this pull request Jun 18, 2026

true async io in linux #18684

Open

14 tasks

ryao suggested changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release range lock after freed in dmu.#18647

release range lock after freed in dmu.#18647
tiehexue wants to merge 1 commit into
openzfs:masterfrom
tiehexue:dead-lock-in-dmu-free

tiehexue commented Jun 8, 2026 •

edited

Loading

Uh oh!

tiehexue commented Jun 8, 2026 •

edited

Loading

Uh oh!

tiehexue commented Jun 8, 2026 •

edited

Loading

Uh oh!

tiehexue commented Jun 9, 2026 •

edited

Loading

Uh oh!

ryao left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tiehexue commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

amd64:

arm64:

Types of changes

Checklist:

Uh oh!

tiehexue commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiehexue commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiehexue commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tiehexue commented Jun 8, 2026 •

edited

Loading

tiehexue commented Jun 8, 2026 •

edited

Loading

tiehexue commented Jun 8, 2026 •

edited

Loading

tiehexue commented Jun 9, 2026 •

edited

Loading

ryao left a comment •

edited

Loading