gfapi: prevent use-after-free from concurrent open during fini#4667
Open
ThalesBarretto wants to merge 1 commit intogluster:develfrom
Open
gfapi: prevent use-after-free from concurrent open during fini#4667ThalesBarretto wants to merge 1 commit intogluster:develfrom
ThalesBarretto wants to merge 1 commit intogluster:develfrom
Conversation
glfs_open() and 30+ other gfapi entry points can succeed while glfs_fini() is tearing down the xlator graph, because __GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL. There is no shutdown gate. A thread calling glfs_open() between fini's drain-loop exit and inode_table_destroy_all() will create a file descriptor referencing freed inodes and xlator private data — a use-after-free. This race has been discussed informally in the GlusterFS community for over 10 years (see issue gluster#404, "Implement proper cleanup sequence", filed 2018 and closed without resolution). The drain loop comment at line 1303 ("leaked frames may exist, we ignore") acknowledges the symptom. A TLA+ formal model of the gfapi session lifecycle proves the race exists at protocol level: splitting glfs_fini into two non-atomic steps (set flag, transition state) and adding a ghost variable to track "stale fds" (fds opened after shutdown begins) shows an invariant violation at depth 5. The real C code is worse than the model — no flag is set at fini entry at all. Fix: add a shutting_down flag to struct glfs, set it at the top of pub_glfs_fini (before the drain loop), and check it at two choke points: 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false -> true) 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex) All 30+ gfapi entry points flow through both checks. Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected. Since glfs_fini itself would be rejected by the new gate in priv_glfs_active_subvol, replace its call to glfs_active_subvol with a direct read of fs->active_subvol under the mutex. The old_subvol handling from priv_glfs_active_subvol is mirrored so a pending old graph still gets its PARENT_DOWN. This patch addresses Bug A (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix that requires a barrier or re-check mechanism. Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> # api/src/glfs-internal.h | 5 +++++ # api/src/glfs-resolve.c | 11 +++++++++++ # api/src/glfs.c | 43 ++++++++++++++++++++++++++++++++++++++++++- # 3 files changed, 58 insertions(+), 1 deletion(-)
ThalesBarretto
added a commit
to ThalesBarretto/glusterfs
that referenced
this pull request
Apr 14, 2026
glfs_fini() never iterates fs->openfds. If a consumer calls fini
with open file descriptors, inode_table_destroy_all force-frees
inodes ("approach 2", inode.c:1835) while fd_t objects still hold
references. The result is dangling pointers, leaked glfs_fd_t
objects, and unbounded memory growth.
Add glfs_drain_openfds() which pops each fd from the openfds list,
marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref
while the inode table is still alive, NULLs glfd->fd, and calls
GF_REF_PUT to trigger destruction.
The fd_unref+NULL step is critical: it makes the drain safe even
when a leaked frame holds an extra GF_REF_GET (refcount > 1).
Without it, the drain would convert a silent leak into a
use-after-free. With the NULL, any later glfs_fd_destroy from a
leaked frame's PUT skips fd_unref and does a clean GF_FREE.
TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC:
ALL 6 invariants HOLD (129 states, 56 distinct, depth 9).
Three-iteration verification loop discovered and fixed the
fd_unref+NULL regression before it shipped.
This is Fix B (companion to Fix A in PR gluster#4667). Fix A blocks
new opens during fini (Bug A). This fix drains pre-existing
opens that were never closed (Bug B).
Fixes: gluster#4668
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
shutting_downflag tostruct glfsthat prevents new gfapi operations from entering the xlator stack afterglfs_fini()begins teardown. Without this,glfs_open()and 30+ other entry points can create file descriptors on a graph being torn down, causing use-after-free.Fixes: #4666
Problem
__GLFS_ENTRY_VALIDATE_FSchecks onlyif (!fs)— there is no shutdown gate.glfs_fini()does not set any flag at entry;ctx->cleanup_startedis set inside the drain loop only aftercall_pool->cntreaches 0. A concurrentglfs_open()passes validation, gets a subvol reference, and creates an fd on inodes thatinode_table_destroy_allis about to destroy.This race has been known informally for 10+ years (see #404, filed 2018, closed stale). The drain loop comment at
glfs.c:1303— "leaked frames may exist, we ignore" — acknowledges the symptom. A TLA+ formal model proves the race exists at protocol level (invariant violation at depth 5 with 19 distinct states). The real C code is worse than the model — no flag is set at fini entry at all.Changes
api/src/glfs-internal.hgf_boolean_t shutting_downtostruct glfsfs->shutting_downin__GLFS_ENTRY_VALIDATE_FS(fast path, no mutex — safe because the flag is monotonic: only transitions false to true)api/src/glfs-resolve.cfs->shutting_downunder mutex inpriv_glfs_active_subvol(authoritative gate — every gfapi entry point flows through here)api/src/glfs.cfs->shutting_down = _gf_trueat the top ofpub_glfs_fini, before the drain loop, under mutex with cond broadcast to wake any waiting threadsglfs_fini's call toglfs_active_subvol(which would now be rejected) with a directfs->active_subvolread under mutex, mirroring theold_subvolhandling frompriv_glfs_active_subvolCleanup paths (
glfs_fd_destroy,glfs_subvol_done) useglfs_lockwith_gf_falseand are unaffected.Scope
This patch addresses Bug A from #4666 (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix requiring a barrier or re-check mechanism.
Test plan
make checkpasses (cmocka unit tests)tests/basic/gfapi/suite passesglfs_open/glfs_finifrom two threads no longer crashesglfs_openafterglfs_finireturnsNULLwitherrno == ESHUTDOWNglfs_new->glfs_init->glfs_open->glfs_close->glfs_fini) is unaffectedRelated
afr_notify()due to access after free inafr_has_quorum#4644 —afr_notifynull deref during fini (open, same family)Formal verification
The TLA+ model and verification instructions are in the issue: #4666