gfapi: prevent use-after-free from concurrent open during fini by ThalesBarretto · Pull Request #4667 · gluster/glusterfs

ThalesBarretto · 2026-04-12T22:54:31Z

Summary

Add a shutting_down flag to struct glfs that prevents new gfapi operations from entering the xlator stack after glfs_fini() begins teardown. Without this, glfs_open() and 30+ other entry points can create file descriptors on a graph being torn down, causing use-after-free.

Fixes: #4666

Problem

__GLFS_ENTRY_VALIDATE_FS checks only if (!fs) — there is no shutdown gate. glfs_fini() does not set any flag at entry; ctx->cleanup_started is set inside the drain loop only after call_pool->cnt reaches 0. A concurrent glfs_open() passes validation, gets a subvol reference, and creates an fd on inodes that inode_table_destroy_all is about to destroy.

This race has been known informally for 10+ years (see #404, filed 2018, closed stale). The drain loop comment at glfs.c:1303 — "leaked frames may exist, we ignore" — acknowledges the symptom. A TLA+ formal model proves the race exists at protocol level (invariant violation at depth 5 with 19 distinct states). The real C code is worse than the model — no flag is set at fini entry at all.

Changes

api/src/glfs-internal.h

Add gf_boolean_t shutting_down to struct glfs
Check fs->shutting_down in __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false to true)

api/src/glfs-resolve.c

Check fs->shutting_down under mutex in priv_glfs_active_subvol (authoritative gate — every gfapi entry point flows through here)

api/src/glfs.c

Set fs->shutting_down = _gf_true at the top of pub_glfs_fini, before the drain loop, under mutex with cond broadcast to wake any waiting threads
Replace glfs_fini's call to glfs_active_subvol (which would now be rejected) with a direct fs->active_subvol read under mutex, mirroring the old_subvol handling from priv_glfs_active_subvol

Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected.

Scope

This patch addresses Bug A from #4666 (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix requiring a barrier or re-check mechanism.

Test plan

Verify make check passes (cmocka unit tests)
Verify regression test tests/basic/gfapi/ suite passes
Multi-threaded gfapi test: concurrent glfs_open/glfs_fini from two threads no longer crashes
Verify glfs_open after glfs_fini returns NULL with errno == ESHUTDOWN
Verify normal single-threaded lifecycle (glfs_new -> glfs_init -> glfs_open -> glfs_close -> glfs_fini) is unaffected

Implement proper cleanup sequence #404 — "Implement proper cleanup sequence" (the 2018 design issue, closed stale)
Null pointer dereference in afr_notify() due to access after free in afr_has_quorum #4644 — afr_notify null deref during fini (open, same family)
Don't fire the callback when cleanup is in progress #4527 — Don't fire timer callback during cleanup (open, different layer — timer infrastructure vs gfapi entry points)
Samba MR !4474 — Samba-side fix that enforces the fd-close-before-fini contract; this patch provides the libgfapi-side enforcement

Formal verification

The TLA+ model and verification instructions are in the issue: #4666

glfs_open() and 30+ other gfapi entry points can succeed while glfs_fini() is tearing down the xlator graph, because __GLFS_ENTRY_VALIDATE_FS only checks if the fs pointer is NULL. There is no shutdown gate. A thread calling glfs_open() between fini's drain-loop exit and inode_table_destroy_all() will create a file descriptor referencing freed inodes and xlator private data — a use-after-free. This race has been discussed informally in the GlusterFS community for over 10 years (see issue gluster#404, "Implement proper cleanup sequence", filed 2018 and closed without resolution). The drain loop comment at line 1303 ("leaked frames may exist, we ignore") acknowledges the symptom. A TLA+ formal model of the gfapi session lifecycle proves the race exists at protocol level: splitting glfs_fini into two non-atomic steps (set flag, transition state) and adding a ghost variable to track "stale fds" (fds opened after shutdown begins) shows an invariant violation at depth 5. The real C code is worse than the model — no flag is set at fini entry at all. Fix: add a shutting_down flag to struct glfs, set it at the top of pub_glfs_fini (before the drain loop), and check it at two choke points: 1. __GLFS_ENTRY_VALIDATE_FS (fast path, no mutex — safe because the flag is monotonic: only transitions false -> true) 2. priv_glfs_active_subvol (authoritative gate, under fs->mutex) All 30+ gfapi entry points flow through both checks. Cleanup paths (glfs_fd_destroy, glfs_subvol_done) use glfs_lock with _gf_false and are unaffected. Since glfs_fini itself would be rejected by the new gate in priv_glfs_active_subvol, replace its call to glfs_active_subvol with a direct read of fs->active_subvol under the mutex. The old_subvol handling from priv_glfs_active_subvol is mirrored so a pending old graph still gets its PARENT_DOWN. This patch addresses Bug A (no new operations after fini starts). Bug B (in-flight operations between drain-loop exit and graph destruction) is a separate, more complex fix that requires a barrier or re-check mechanism. Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com> # api/src/glfs-internal.h | 5 +++++ # api/src/glfs-resolve.c | 11 +++++++++++ # api/src/glfs.c | 43 ++++++++++++++++++++++++++++++++++++++++++- # 3 files changed, 58 insertions(+), 1 deletion(-)

glfs_fini() never iterates fs->openfds. If a consumer calls fini with open file descriptors, inode_table_destroy_all force-frees inodes ("approach 2", inode.c:1835) while fd_t objects still hold references. The result is dangling pointers, leaked glfs_fd_t objects, and unbounded memory growth. Add glfs_drain_openfds() which pops each fd from the openfds list, marks it GLFD_CLOSE, releases the fd_t's inode ref via fd_unref while the inode table is still alive, NULLs glfd->fd, and calls GF_REF_PUT to trigger destruction. The fd_unref+NULL step is critical: it makes the drain safe even when a leaked frame holds an extra GF_REF_GET (refcount > 1). Without it, the drain would convert a silent leak into a use-after-free. With the NULL, any later glfs_fd_destroy from a leaked frame's PUT skips fd_unref and does a clean GF_FREE. TLA+ model (gfapi_fd_drain_actual.tla) verified by TLC: ALL 6 invariants HOLD (129 states, 56 distinct, depth 9). Three-iteration verification loop discovered and fixed the fd_unref+NULL regression before it shipped. This is Fix B (companion to Fix A in PR gluster#4667). Fix A blocks new opens during fini (Bug A). This fix drains pre-existing opens that were never closed (Bug B). Fixes: gluster#4668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>

ThalesBarretto mentioned this pull request Apr 14, 2026

gfapi: glfs_fini does not close open file descriptors (no openfds drain) #4668

Open

ThalesBarretto mentioned this pull request Apr 14, 2026

gfapi: drain open fds in glfs_fini before graph teardown #4669

Open

5 tasks

ThalesBarretto marked this pull request as ready for review April 14, 2026 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gfapi: prevent use-after-free from concurrent open during fini#4667

gfapi: prevent use-after-free from concurrent open during fini#4667
ThalesBarretto wants to merge 1 commit intogluster:develfrom
ThalesBarretto:fix-gfapi-fini-open-race

ThalesBarretto commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThalesBarretto commented Apr 12, 2026

Summary

Problem

Changes

Scope

Test plan

Related

Formal verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant