Skip to content

fsnotify: Fix checkpoint failure for inotify watches on overlayfs#3048

Closed
ankimaha-sys wants to merge 1 commit into
checkpoint-restore:criu-devfrom
ankimaha-sys:fix/fsnotify-overlayfs-checkpoint
Closed

fsnotify: Fix checkpoint failure for inotify watches on overlayfs#3048
ankimaha-sys wants to merge 1 commit into
checkpoint-restore:criu-devfrom
ankimaha-sys:fix/fsnotify-overlayfs-checkpoint

Conversation

@ankimaha-sys

Copy link
Copy Markdown

Summary

Fix CRIU checkpoint failure when dumping inotify watches on files backed by overlayfs, as seen on OpenShift 4.18 with .NET 8 container workloads.

Error before fix:

fsnotify: Handle 0x34c:0x1961af5a cannot be opened
Error (criu/fsnotify.c:284): fsnotify: Can't dump that handle

Root cause: In alloc_openable(), suitable_mount_found is only set after open_by_handle_at() succeeds. On overlayfs, the mount with matching s_dev is found, but open_by_handle_at() fails because overlayfs does not reliably support file handle decoding (depends on kernel config and nfs_export mount option). Since suitable_mount_found stays 0, the function returns ERR_NO_MOUNT, sending check_open_handle() to the fault path where irmap_lookup() also fails because its hint directories don't cover container filesystem paths.

Changes

  1. Move suitable_mount_found = 1 in alloc_openable() to before the open_by_handle_at() attempt, so the mount is correctly reported as found even when handle decoding fails.

  2. Add overlayfs-specific path resolution in check_open_handle(): When ERR_NO_PATH_IN_MOUNT is returned and the mount is overlayfs, scan the overlay mount tree for the matching inode using a new scan_dir_for_inode() helper function.

  3. Add scan_dir_for_inode() helper: Recursive directory scanner that finds a file by (s_dev, i_ino) within a mount tree. Respects mount boundaries (won't cross devices), limits recursion depth to 32, and doesn't follow symlinks.

Test plan

  • Build CRIU with the patch on a Fedora/RHEL system
  • Create a container (podman/CRI-O) with overlayfs storage driver
  • Run a workload that creates inotify watches (e.g. .NET FileSystemWatcher)
  • Checkpoint the container with criu dump or kubelet checkpoint API
  • Verify checkpoint succeeds without Can't dump that handle error
  • Verify existing fsnotify tests still pass (make test / zdtm fsnotify tests)

Fixes: https://issues.redhat.com/browse/OCPBUGS-87023

Made with Cursor

@adrianreber

Copy link
Copy Markdown
Member

Looks like what I did a couple of days ago #3043 😉

@ankimaha-sys

Copy link
Copy Markdown
Author

Thanks for the heads up @adrianreber! I arrived at the same fix independently — good to know we're aligned on the approach.

I've updated this PR to be more complete:

  • Bumped scan depth to 64 (matching your approach)
  • Added entry count tracking with a warning after 10,000 entries
  • Added a ZDTM test (inotify_overlayfs) that mounts an overlay, creates an inotify watch, checkpoints/restores, and verifies events still work

Happy to close this in favour of #3043 if you prefer, or if there's anything useful here you'd like to cherry-pick, feel free. Either way — glad this is getting fixed!

@ankimaha-sys ankimaha-sys force-pushed the fix/fsnotify-overlayfs-checkpoint branch from 51c6bb7 to 47bbe0a Compare June 14, 2026 15:27
When checkpointing a container with inotify watches on files backed by
overlayfs, CRIU fails with:

  fsnotify: Handle 0x34c:0x1961af5a cannot be opened
  Error (criu/fsnotify.c:284): fsnotify: Can't dump that handle

Root cause: In alloc_openable(), suitable_mount_found is only set after
open_by_handle_at() succeeds. On overlayfs, the mount with matching
s_dev is found, but open_by_handle_at() fails because overlayfs does
not reliably support file handle decoding. This causes check_open_handle()
to take the fault path where irmap_lookup() also fails since its hint
directories don't cover container filesystem paths.

Fix this by:

1. Setting suitable_mount_found as soon as a mount with matching s_dev
   is found, before attempting open_by_handle_at(). This correctly
   reports ERR_NO_PATH_IN_MOUNT to the caller.

2. Adding overlayfs-specific path resolution in check_open_handle()
   that scans the overlay mount tree for the matching inode using a
   new scan_dir_for_inode() helper (depth-limited to 64, warns after
   10,000 entries scanned).

3. Adding a ZDTM test (inotify_overlayfs) that mounts an overlay,
   creates an inotify watch, checkpoints/restores, and verifies
   events still work.

Fixes: https://issues.redhat.com/browse/OCPBUGS-87023
Signed-off-by: Ankit Mahajan <ankimaha@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@ankimaha-sys ankimaha-sys force-pushed the fix/fsnotify-overlayfs-checkpoint branch from 47bbe0a to 59d5b4d Compare June 14, 2026 16:53
@rst0git

rst0git commented Jun 16, 2026

Copy link
Copy Markdown
Member

@ankimaha-sys Thanks for working on this! I went ahead and merged #3043

@rst0git rst0git closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants