Skip to content

Add anon_size to the overflow tolerance to avoid deadlock.#18659

Open
tiehexue wants to merge 1 commit into
openzfs:masterfrom
tiehexue:anon-size-to-overflow
Open

Add anon_size to the overflow tolerance to avoid deadlock.#18659
tiehexue wants to merge 1 commit into
openzfs:masterfrom
tiehexue:anon-size-to-overflow

Conversation

@tiehexue

@tiehexue tiehexue commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

This is for another deadlock in #18426 . In some edge case, anon_size could be GBs, if return ARC_OVF_SEVERE which make txg_sync_thread blocked, there will be deadlock risk. In most of cases, ARC_OVF_SOME should be good for txg_sync_thread.

Description

When doing overflow check which may block txg_sync_thread, always adding anon_size.

How Has This Been Tested?

Local test and coming CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: tiehexue <tiehexue@hotmail.com>

@amotin amotin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is right. The sync process indeed need to proceed to clean the anonymous data. But anonymous data may be not all. Some metadata may also be held and so not be evictable. I wonder if we could somehow check that we actually have something to evict before we wait for it indefinitely?

@tiehexue

Copy link
Copy Markdown
Contributor Author

The motivation is right. The sync process indeed need to proceed to clean the anonymous data. But anonymous data may be not all. Some metadata may also be held and so not be evictable. I wonder if we could somehow check that we actually have something to evict before we wait for it indefinitely?

This PR is a minor update to "memory pressure management", not touching the real eviction. Maybe another PR if necessary later.

@amotin

amotin commented Jun 10, 2026

Copy link
Copy Markdown
Member

I understand that it is about pressure. You may see that code was actually written by me, even though not yesterday. But if eviction is still ongoing, it might have sense to wait for it a bit. One of the problems in this area is that sync thread might also produce dirty data, and letting it run without throttling may cause more problems.

@tiehexue

Copy link
Copy Markdown
Contributor Author

I understand that it is about pressure. You may see that code was actually written by me, even though not yesterday. But if eviction is still ongoing, it might have sense to wait for it a bit. One of the problems in this area is that sync thread might also produce dirty data, and letting it run without throttling may cause more problems.

Agreed, however, it is just a strategy balance. For real production zfs system, the memory is dozens GBs, and current overflow and "overflow *= 3", is just several hundred MBs if I understand correctly, it is small. So in most of cases, if txg committed in time, anon_size will not be so bigger, and will not be problem source. And if very rare edge case, high pressure, adding anon_size to overflow, is giving "breathe" chance to the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants