main: fix incorrect directory entries due to unstable iteration order by stanhu · Pull Request #448 · containers/fuse-overlayfs

stanhu · 2025-11-04T18:06:40Z

When a directory is opened, closed, and reopened, which in turn causes nodes to be freed and recreated, the iteration order of entries can change. This causes the kernel FUSE layer's offset-based caching to skip or duplicate entries.

In https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/9408, we have been tracking a problem with GitLab failing to load a file on a Fedora host with podman and fuse-overlayfs. When the problem occurs, we observed:

Files exist but don't appear in ls or directory listings.
Some files appear duplicated in directory listings.
touch directory temporarily fixes the issue.
This affects directories with many entries (>100 files).
Only happens when directory is accessed multiple times.

The environment:

fuse-overlayfs 1.15
Nested containers (Docker → Podman → fuse-overlayfs)
Directories with 200+ entries

FUSE readdir uses offsets as opaque cookies to track position in directory listings:

Kernel calls readdir(offset=0) → gets entries 0-100
Kernel caches "offset 50 = file_x.rb"
Kernel calls readdir(offset=100) → continues from position 100

The kernel assumes offsets are stable identifiers across the lifetime of the directory handle.

fuse-overlayfs stores directory entries in a hash table. When entries are freed (on closedir) and recreated (on next opendir), the hash iteration order can change due to:

Free entry list recycling - Hash entries freed in one order, reallocated in LIFO order
Bucket chain positions - Entries may land in different positions within collision chains
Hash table resize - If resize happens between loads, all positions change

The bug sequence:

Session 1:
  opendir("entities/")
  readdir(offset=0...) → package_version.rb at position 23
  Kernel caches: "I've read through offset 50, including package_version.rb at 23"
  closedir() → all nodes freed, hash cleared

Session 2 (same directory, moments later):
  opendir("entities/") → nodes recreated
  readdir(offset=0...) → package_version.rb now at position 22 (ORDER CHANGED!)
  Kernel: "I already have offsets 0-50 cached"
  Kernel: "Offset 22 < 50, so I already returned this, skip it"

  Result: package_version.rb skipped, missing from final ls output!

For GitLab, newer versions of Zeitwerk (Ruby autoloader) access directories more frequently during initialization, increasing the chance of hitting this scenario. The bug exists regardless, but timing affects visibility.

This is difficult to reproduce with simple shell commands because it requires:

Multiple opendir/closedir cycles on same directory
Nodes being freed and recreated between cycles
Application reading directory results that span these cycles

The fix:

Ensure stable directory entry ordering by sorting entries alphabetically in reload_tbl.

This guarantees:

Same filename always at same offset
Kernel cache assumptions remain valid
No entries skipped or duplicated

Increasing initial hash size from 128 to 512 reduces the problem by making iteration order more stable (fewer collisions, less churn in bucket chains), but doesn't eliminate it. Sorting is the proper fix.

With this patch:

Directory listings are stable across multiple opendir/closedir cycles
No missing or duplicate entries
Performance impact: O(n log n) sort on ~200-500 entries ≈ microseconds

Closes #447

When a directory is opened, closed, and reopened, which in turn causes nodes to be freed and recreated, the iteration order of entries can change. This causes the kernel FUSE layer's offset-based caching to skip or duplicate entries. In https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/9408, we have been tracking a problem with GitLab failing to load a file on a Fedora host with `podman` and `fuse-overlayfs`. When the problem occurs, we observed: - Files exist but don't appear in `ls` or directory listings. - Some files appear duplicated in directory listings. - `touch directory` temporarily fixes the issue. - This affects directories with many entries (>100 files). - Only happens when directory is accessed multiple times. The environment: - fuse-overlayfs 1.15 - Nested containers (Docker → Podman → fuse-overlayfs) - Directories with 200+ entries FUSE readdir uses offsets as opaque cookies to track position in directory listings: 1. Kernel calls `readdir(offset=0)` → gets entries 0-100 2. Kernel caches "offset 50 = file_x.rb" 3. Kernel calls `readdir(offset=100)` → continues from position 100 The kernel assumes offsets are stable identifiers across the lifetime of the directory handle. fuse-overlayfs stores directory entries in a hash table. When entries are freed (on `closedir`) and recreated (on next `opendir`), the hash iteration order can change due to: - **Free entry list recycling** - Hash entries freed in one order, reallocated in LIFO order - **Bucket chain positions** - Entries may land in different positions within collision chains - **Hash table resize** - If resize happens between loads, all positions change The bug sequence: ``` Session 1: opendir("entities/") readdir(offset=0...) → package_version.rb at position 23 Kernel caches: "I've read through offset 50, including package_version.rb at 23" closedir() → all nodes freed, hash cleared Session 2 (same directory, moments later): opendir("entities/") → nodes recreated readdir(offset=0...) → package_version.rb now at position 22 (ORDER CHANGED!) Kernel: "I already have offsets 0-50 cached" Kernel: "Offset 22 < 50, so I already returned this, skip it" Result: package_version.rb skipped, missing from final ls output! ``` For GitLab, newer versions of Zeitwerk (Ruby autoloader) access directories more frequently during initialization, increasing the chance of hitting this scenario. The bug exists regardless, but timing affects visibility. This is difficult to reproduce with simple shell commands because it requires: 1. Multiple opendir/closedir cycles on same directory 2. Nodes being freed and recreated between cycles 3. Application reading directory results that span these cycles The fix: **Ensure stable directory entry ordering by sorting entries alphabetically in `reload_tbl`.** This guarantees: - Same filename always at same offset - Kernel cache assumptions remain valid - No entries skipped or duplicated Increasing initial hash size from 128 to 512 reduces the problem by making iteration order more stable (fewer collisions, less churn in bucket chains), but doesn't eliminate it. Sorting is the proper fix. With this patch: - Directory listings are stable across multiple opendir/closedir cycles - No missing or duplicate entries - Performance impact: O(n log n) sort on ~200-500 entries ≈ microseconds Closes containers#447 Signed-off-by: Stan Hu <stanhu@gmail.com>

stanhu · 2025-11-04T18:18:47Z

@giuseppe Would you mind reviewing this?

giuseppe

Thanks!

LGTM

mathstuf

I have pulled a scratch build of fuse-overlayfs-1.13 with this patch into our CI which has been very consistently running into this issue. It works!

https://gitlab.kitware.com/utils/ghostflow-director/-/jobs/11881759#L46

Fedora backport PR (I grabbed the build from the Koji scratch build for this): https://src.fedoraproject.org/rpms/fuse-overlayfs/pull-request/6

stanhu force-pushed the sh-fix-stable-sort branch from 80b6c17 to 4266f68 Compare November 4, 2025 18:08

giuseppe approved these changes Nov 4, 2025

View reviewed changes

mathstuf approved these changes Nov 4, 2025

View reviewed changes

giuseppe merged commit d766201 into containers:main Nov 5, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main: fix incorrect directory entries due to unstable iteration order#448

main: fix incorrect directory entries due to unstable iteration order#448
giuseppe merged 1 commit intocontainers:mainfrom
stanhu:sh-fix-stable-sort

stanhu commented Nov 4, 2025

Uh oh!

stanhu commented Nov 4, 2025

Uh oh!

giuseppe left a comment

Uh oh!

mathstuf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stanhu commented Nov 4, 2025

Uh oh!

stanhu commented Nov 4, 2025

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

mathstuf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants