Skip to content

gfapi: prevent use-after-free from concurrent open during fini#4667

Open
ThalesBarretto wants to merge 1 commit intogluster:develfrom
ThalesBarretto:fix-gfapi-fini-open-race
Open

gfapi: prevent use-after-free from concurrent open during fini#4667
ThalesBarretto wants to merge 1 commit intogluster:develfrom
ThalesBarretto:fix-gfapi-fini-open-race

Conversation

@ThalesBarretto
Copy link
Copy Markdown
Contributor

Summary

Add a shutting_down flag to struct glfs that prevents new gfapi operations from entering the xlator stack after glfs_fini() begins teardown. Without this, glfs_open() and 30+ other entry points can create file descriptors on a graph being torn down, causing use-after-free.

Fixes: #4666

Problem

__GLFS_ENTRY_VALIDATE_FS checks only if (!fs) — there is no shutdown gate. glfs_fini() does not set any flag at entry; ctx->cleanup_started is set inside the drain loop only after call_pool->cnt reaches 0. A concurrent glfs_open() passes validation, gets a subvol reference, and creates an fd on inodes that inode_table_destroy_all is about to destroy.

This race has been known informally for 10+ years (see #404, filed 2018, closed stale). The drain loop comment at glfs.c:1303 — "leaked frames may exist, we ignore" — acknowledges the symptom. A TLA+ formal model proves the race exists at protocol level (invariant violation at depth 5 with 19 distinct states). The real C code is worse than the model — no flag is set at fini entry at all.

Changes

api/src/glfs-internal.h

  • Add gf_boolean_t shutting_down to struct glfs
  • Check fs->shutting_down in __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false to true)

api/src/glfs-resolve.c

  • Check fs->shutting_down under mutex in priv_glfs_active_subvol (authoritative gate — every gfapi entry point flows through here)

api/src/glfs.c

  • Set fs->shutting_down = _gf_true at the top of pub_glfs_fini, before the drain loop, under mutex with cond broadcast to wake any waiting threads
  • Replace glfs_fini's call to glfs_active_subvol (which would now be rejected) with a direct fs->active_subvol read under mutex, mirroring the old_subvol handling from priv_glfs_active_subvol

Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected.

Scope

This patch addresses Bug A from #4666 (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix requiring a barrier or re-check mechanism.

Test plan

  • Verify make check passes (cmocka unit tests)
  • Verify regression test tests/basic/gfapi/ suite passes
  • Multi-threaded gfapi test: concurrent glfs_open/glfs_fini from two threads no longer crashes
  • Verify glfs_open after glfs_fini returns NULL with errno == ESHUTDOWN
  • Verify normal single-threaded lifecycle (glfs_new -> glfs_init -> glfs_open -> glfs_close -> glfs_fini) is unaffected

Related

Formal verification

The TLA+ model and verification instructions are in the issue: #4666

glfs_open() and 30+ other gfapi entry points can succeed while
glfs_fini() is tearing down the xlator graph, because
__GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL.
There is no shutdown gate.  A thread calling glfs_open() between
fini's drain-loop exit and inode_table_destroy_all() will create
a file descriptor referencing freed inodes and xlator private
data — a use-after-free.

This race has been discussed informally in the GlusterFS community
for over 10 years (see issue gluster#404, "Implement proper cleanup
sequence", filed 2018 and closed without resolution).  The drain
loop comment at line 1303 ("leaked frames may exist, we ignore")
acknowledges the symptom.

A TLA+ formal model of the gfapi session lifecycle proves the
race exists at protocol level: splitting glfs_fini into two
non-atomic steps (set flag, transition state) and adding a ghost
variable to track "stale fds" (fds opened after shutdown begins)
shows an invariant violation at depth 5.  The real C code is
worse than the model — no flag is set at fini entry at all.

Fix: add a shutting_down flag to struct glfs, set it at the top
of pub_glfs_fini (before the drain loop), and check it at two
choke points:

 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because
    the flag is monotonic: only transitions false -> true)

 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex)

All 30+ gfapi entry points flow through both checks.  Cleanup
paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with
_gf_false and are unaffected.

Since glfs_fini itself would be rejected by the new gate in
priv_glfs_active_subvol, replace its call to glfs_active_subvol
with a direct read of fs->active_subvol under the mutex.  The
old_subvol handling from priv_glfs_active_subvol is mirrored so
a pending old graph still gets its PARENT_DOWN.

This patch addresses Bug A (no new operations after fini starts).
Bug B (in-flight operations between drain-loop exit and graph
destruction) is a separate, more complex fix that requires a
barrier or re-check mechanism.

Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
#  api/src/glfs-internal.h |  5 +++++
#  api/src/glfs-resolve.c  | 11 +++++++++++
#  api/src/glfs.c          | 43 ++++++++++++++++++++++++++++++++++++++++++-
#  3 files changed, 58 insertions(+), 1 deletion(-)
ThalesBarretto added a commit to ThalesBarretto/glusterfs that referenced this pull request Apr 14, 2026
glfs_fini() never iterates fs->openfds.  If a consumer calls fini
with open file descriptors, inode_table_destroy_all force-frees
inodes ("approach 2", inode.c:1835) while fd_t objects still hold
references.  The result is dangling pointers, leaked glfs_fd_t
objects, and unbounded memory growth.

Add glfs_drain_openfds() which pops each fd from the openfds list,
marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref
while the inode table is still alive, NULLs glfd->fd, and calls
GF_REF_PUT to trigger destruction.

The fd_unref+NULL step is critical: it makes the drain safe even
when a leaked frame holds an extra GF_REF_GET (refcount > 1).
Without it, the drain would convert a silent leak into a
use-after-free.  With the NULL, any later glfs_fd_destroy from a
leaked frame's PUT skips fd_unref and does a clean GF_FREE.

TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC:
ALL 6 invariants HOLD (129 states, 56 distinct, depth 9).
Three-iteration verification loop discovered and fixed the
fd_unref+NULL regression before it shipped.

This is Fix B (companion to Fix A in PR gluster#4667).  Fix A blocks
new opens during fini (Bug A).  This fix drains pre-existing
opens that were never closed (Bug B).

Fixes: gluster#4668

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
@ThalesBarretto ThalesBarretto marked this pull request as ready for review April 14, 2026 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gfapi: use-after-free race between glfs_open and glfs_fini (no shutdown gate)

1 participant