posix: fsyncer pthread_cancel causes UB (locked mutex destroy) and stub leak

## Summary

The posix fsyncer thread is shut down via `pthread_cancel`, which causes
undefined behavior (destroying a locked mutex) and resource leaks (orphaned
`call_stub_t` entries) on every shutdown path.

## Affected code

| Operation | File | Line |
|-----------|------|------|
| `posix_fsyncer` main loop | `xlators/storage/posix/src/posix-helpers.c` | 2463-2511 |
| `posix_fsyncer_pick` (cond_wait = cancel point) | `xlators/storage/posix/src/posix-helpers.c` | 2379-2396 |
| `gf_thread_cleanup_xint` (cancel + join) | `xlators/storage/posix/src/posix-common.c` | 1315 |
| `pthread_mutex_destroy` on potentially locked mutex | `xlators/storage/posix/src/posix-common.c` | 1327 |

## Root cause

The fsyncer thread runs a `for(;;)` loop with no exit condition. The only
shutdown mechanism is `pthread_cancel` via `gf_thread_cleanup_xint()` in
`posix_fini`. This has two problems:

### 1. Mutex destroyed while locked (UB)

`posix_fsyncer_pick` holds `priv->fsync_mutex` while calling
`pthread_cond_wait`, which is a POSIX-defined cancellation point. When
cancellation is delivered during `pthread_cond_wait`, POSIX.1-2008
Section 2.9.5 specifies the mutex is re-acquired before cleanup handlers
execute. However, `posix_fsyncer` registers **zero** `pthread_cleanup_push`
handlers (confirmed: no instances in the posix source tree).

After `pthread_cancel` + `pthread_join`, the mutex remains locked.
`posix_fini` then calls `pthread_mutex_destroy(&priv->fsync_mutex)` on a
locked mutex — undefined behavior per POSIX.1-2008.

### 2. Stub leak

When cancellation kills the fsyncer, stubs in two locations become
unreachable:

- **Local list:** After `posix_fsyncer_pick` splices stubs from
  `priv->fsyncs` to the stack-local `list` variable, cancel during
  `gf_nanosleep` or mid-processing leaves those stubs on a destroyed stack
  frame.

- **`priv->fsyncs`:** Stubs queued after the last pick are never processed.
  `posix_fini` never drains `priv->fsyncs` — it goes directly to destroying
  the mutex and freeing priv.

Each leaked `call_stub_t` holds a `call_frame_t *` → leaked frames → leaked
`fd_t` and `inode_t` references → inode table cannot clean up.

## Reproduction

All three shutdown paths are affected:

- **Path A (normal shutdown):** `glusterfsd` exit → `posix_fini`
- **Path B (init failure):** `posix_init` fails after fsyncer is started →
  `posix_fini`
- **Path C (brick mux detach):** brick detach → `xlator_fini_rec` →
  `posix_fini` (process stays alive)

Path C is the most severe because the process continues running with the
leaked resources and undefined mutex state.

The bug is deterministic when cancellation is delivered during
`pthread_cond_wait` (the fsyncer spends most of its time here). The impact
of the UB is implementation-dependent — glibc may silently succeed,
deadlock, or abort depending on the mutex type and internal state.

## Impact

- **Undefined behavior** in `pthread_mutex_destroy` (implementation-dependent:
  glibc may silently succeed, deadlock, or abort)
- **Resource leaks** proportional to the number of fsync stubs in flight at
  shutdown time (frames, fds, inodes)
- In the brick mux detach path, leaked inode references may prevent inode
  table cleanup for the detached brick

## Suggested fix

Replace `pthread_cancel` with a cooperative exit flag, following the same
pattern already used by the io_uring thread in the same xlator
(`priv->uring_thread_exit` in `posix-io-uring.c`):

1. Add `gf_boolean_t fsyncer_exit` to `struct posix_private`
2. Add `!priv->fsyncer_exit` to `posix_fsyncer_pick`'s cond_wait predicate
3. Have `posix_fsyncer_pick` drain remaining stubs and return -1 on exit
4. Have `posix_fsyncer` process final stubs and break on count < 0
5. Have `posix_fini` set the flag under mutex, signal, and `pthread_join`
   (no `pthread_cancel`)

This ensures:
- The mutex is always unlocked before the fsyncer exits
- All queued stubs are processed (STACK_UNWINDed) before shutdown
- `pthread_mutex_destroy` operates on an unlocked mutex

## Verification

The fix was verified using SPIN model checking (exhaustive state-space
exploration of all thread interleavings):

**Buggy code (pthread_cancel):**
- P1 `[] !(mutex_destroyed && fsyncer_holds_mutex)` — **VIOLATED** (depth 24, 13 states)
- P2 `[] (priv_freed -> no_stubs_remaining)` — **VIOLATED** (depth 35, 23 states)

**Fixed code (exit flag):**
- P1 mutex safety — **VERIFIED** (0 errors, 1358 states, depth 139)
- P2 no stub leak — **VERIFIED** (0 errors, 1358 states, depth 139)
- P3 fini completes (liveness) — **VERIFIED** (0 errors, 1358 states, depth 139)

The exit-flag approach was also cross-checked at the shutdown composition
level using TLA+ model checking (20 states, invariant holds).

### Counterexample: H1 — mutex destroyed while locked

SPIN's shortest counterexample for property P1, showing the exact
interleaving that leads to undefined behavior. Each row is one step;
`mutex` shows whether `fsync_mutex` is free, held by the fsyncer, or
destroyed; `cancel` shows `pthread_cancel` state.

```
Step  Thread    Action                               mutex       fsyncs  local  cancel
────  ────────  ───────────────────────────────────   ──────────  ──────  ─────  ────────
 1    fsyncer   enter posix_fsyncer_pick              free        0       0      —
 2    fini      non-deterministic start               free        0       0      —
 3    fini      pthread_cancel → cancel_pending        free        0       0      PENDING
 4    fsyncer   pthread_mutex_lock(fsync_mutex)       HELD        0       0      pending
 5    fsyncer   fsyncs empty → enter cond_wait        HELD        0       0      pending
 6    fsyncer   *** CANCEL DELIVERED at cond_wait     HELD        0       0      DELIVERED
                (POSIX: mutex re-acquired,
                 no pthread_cleanup_push registered
                 → mutex stays locked)
 7    fsyncer   exit (fsyncer_exited = true)          HELD        0       0      —
 8    fini      pthread_join returns                  HELD        0       0      —
 9    fini      pthread_mutex_destroy(fsync_mutex)    DESTROYED   0       0      —
                *** UB: mutex destroyed while locked
                *** fsyncer_holds_mutex=1 && mutex_destroyed=1
```

Terminal state: `mutex_locked=1, mutex_destroyed=1, fsyncer_holds_mutex=1`.
Property `[] !(mutex_destroyed && fsyncer_holds_mutex)` **violated**.

### Counterexample: H2 — stub leak after cancel

SPIN's shortest counterexample for property P2, showing how stubs become
orphaned. This trace goes through the same cond_wait cancel point as H1,
then a late-arriving stub is queued after the fsyncer is already dead.

```
Step  Thread    Action                               mutex       fsyncs  local  cancel
────  ────────  ───────────────────────────────────   ──────────  ──────  ─────  ────────
 1    fsyncer   enter posix_fsyncer_pick              free        0       0      —
 2    fini      non-deterministic start               free        0       0      —
 3    fini      pthread_cancel → cancel_pending        free        0       0      PENDING
 4    fsyncer   pthread_mutex_lock(fsync_mutex)       HELD        0       0      pending
 5    fsyncer   fsyncs empty → enter cond_wait        HELD        0       0      pending
 6    fsyncer   *** CANCEL DELIVERED at cond_wait     HELD        0       0      DELIVERED
 7    fsyncer   exit (fsyncer_exited = true)          HELD        0       0      —
 8    fini      pthread_join returns                  HELD        0       0      —
 9    fini      pthread_mutex_destroy(fsync_mutex)    DESTROYED   0       0      —
10    producer  enqueue stub → fsyncs_list++          DESTROYED   1       0      —
                (posix_fsync called during shutdown —
                 no guard prevents enqueueing)
11    fini      GF_FREE(priv)                         —           1       0      —
                *** priv_freed=1 with fsyncs_list=1
                *** stub orphaned: no thread will ever
                    process it → frame/fd/inode leak
```

Terminal state: `priv_freed=1, fsyncs_list=1, local_list=0`.
Property `[] (priv_freed -> (local_list == 0 && fsyncs_list == 0))`
**violated**.

## Provenance

The fsyncer was born without a shutdown mechanism. The `pthread_cancel`
was bolted on 4 years later for brick mux, without modifying the
`for(;;)` loop. The correct pattern (cooperative exit flag) was
implemented 3 years after that in the same file for io_uring, but
nobody went back to retrofit the fsyncer.

| Date | Author | Commit | What happened |
|------|--------|--------|---------------|
| 2013-07 | Anand Avati | [`132fefeffb`](https://github.com/gluster/glusterfs/commit/132fefeffb) | Created the fsyncer — `for(;;)` loop, **no shutdown mechanism**. Performance optimization to batch AFR's fsync storms. Gerrit [#4746](https://review.gluster.org/4746). |
| 2017-05 | Mohit Agrawal | [`d3c0f96633`](https://github.com/gluster/glusterfs/commit/d3c0f96633) | **Introduced the bug.** Added `gf_thread_cleanup_xint(priv->fsyncer)` to `posix_fini` for brick mux (BUG 1453977). Correct intent (stop threads on detach), unsafe mechanism (`pthread_cancel` with no cleanup handler). Gerrit [#17356](https://review.gluster.org/17356). |
| 2017-11 | ShyamsundarR | [`a8325a23c7`](https://github.com/gluster/glusterfs/commit/a8325a23c7) | Reorganized posix for rio reuse. Fsyncer cleanup moved from `posix.c` to `posix-common.c`. Bug preserved unchanged. |
| 2019-04 | Raghavendra Bhat | [`b922793588`](https://github.com/gluster/glusterfs/commit/b922793588) | **Fixed the exact same bug class in bit-rot-stub** (BZ 1700078). Signer thread cancelled via `gf_thread_cleanup_xint` while holding mutex → mutex never released → deadlock on re-enable. Fix: `pthread_cleanup_push`. The posix fsyncer has the same pattern but was not addressed. |
| 2020-10 | Ravishankar N | [`4b9f95b699`](https://github.com/gluster/glusterfs/commit/4b9f95b699) | Added io_uring support with `uring_thread_exit` — **the correct cooperative exit flag pattern**, in the same xlator. The fsyncer was not updated. |
| 2021-10 | Dmitry Antipov | [`0af464c492`](https://github.com/gluster/glusterfs/commit/0af464c492) | Fixed thread leak in posix health_check (PR #2851). Touched the same file but only the health_check thread, not the fsyncer. |

The bug has been latent for 9 years (2017–2026).

## Prior and related issues

[**#1338**](https://github.com/gluster/glusterfs/issues/1338) — *"Add synchronization for POSIX xlator's internal threads"* (Dmitry Antipov, 2020). Proposed the same fix: replace `pthread_cancel` with normal termination for all three posix internal threads (health check, disk
space check, **and the fsyncer**). A Gerrit patch ([#24640](https://review.gluster.org/24640)) was posted but never merged. The issue was closed by the stale bot after 6 months of inactivity.
Dmitry later got a smaller fix merged as PR [#2851](https://github.com/gluster/glusterfs/pull/2851) (2021), but it only addressed the health_check thread — the fsyncer was left unchanged.
**This issue completes the work #1338 started for the fsyncer.**

Other related issues in the same teardown-race family:

| Issue | Relationship |
|-------|-------------|
| [#4684](https://github.com/gluster/glusterfs/issues/4684) | Sibling — `posix_fini` disk_space_check UAF and double-decrement (same investigation) |
| [#4666](https://github.com/gluster/glusterfs/issues/4666) | Same theme — glfs_fini/open race, shutdown path missing synchronization gate |
| [#4159](https://github.com/gluster/glusterfs/issues/4159) | Related symptom — io_uring SIGABRT; io_uring has the correct exit flag but its fini is never called |
| [#3727](https://github.com/gluster/glusterfs/issues/3727) | Brick mux restart crash — locks xlator `priv=NULL` during teardown (closed wontfix) |
| [#1982](https://github.com/gluster/glusterfs/issues/1982) | Brick mux test crash — multiplex-limit-issue-151.t intermittent corefile (closed wontfix) |

## Related: other `gf_thread_cleanup_xint` call sites

This fix addresses only the posix fsyncer. However, there are 12 other
`gf_thread_cleanup_xint` call sites across 6 xlators that use the same
`pthread_cancel` mechanism. Only two have `pthread_cleanup_push` handlers
to release resources held at cancellation points:

- **bit-rot-stub** (`b922793588` — fixed in 2019)
- **AFR** self-heal thread

The remaining call sites — posix `health_check`, upcall `reaper_thr`,
index `thread`, and leases `recall_thr` — have no cleanup handlers and
may be exposed to the same pattern if they hold mutexes or resources at
cancellation points. This is out of scope for this fix but may warrant a
broader audit.

## Found by

Manual investigation of posix xlator shutdown protocol, verified by SPIN
concurrent model checking.

Operation	File	Line
`posix_fsyncer` main loop	`xlators/storage/posix/src/posix-helpers.c`	2463-2511
`posix_fsyncer_pick` (cond_wait = cancel point)	`xlators/storage/posix/src/posix-helpers.c`	2379-2396
`gf_thread_cleanup_xint` (cancel + join)	`xlators/storage/posix/src/posix-common.c`	1315
`pthread_mutex_destroy` on potentially locked mutex	`xlators/storage/posix/src/posix-common.c`	1327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

posix: fsyncer pthread_cancel causes UB (locked mutex destroy) and stub leak #4685

Summary

Affected code

Root cause

1. Mutex destroyed while locked (UB)

2. Stub leak

Reproduction

Impact

Suggested fix

Verification

Counterexample: H1 — mutex destroyed while locked

Counterexample: H2 — stub leak after cancel

Provenance

Prior and related issues

Related: other `gf_thread_cleanup_xint` call sites

Found by

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Date	Author	Commit	What happened
2013-07	Anand Avati	`132fefeffb`	Created the fsyncer — `for(;;)` loop, no shutdown mechanism. Performance optimization to batch AFR's fsync storms. Gerrit #4746.
2017-05	Mohit Agrawal	`d3c0f96633`	Introduced the bug. Added `gf_thread_cleanup_xint(priv->fsyncer)` to `posix_fini` for brick mux (BUG 1453977). Correct intent (stop threads on detach), unsafe mechanism (`pthread_cancel` with no cleanup handler). Gerrit #17356.
2017-11	ShyamsundarR	`a8325a23c7`	Reorganized posix for rio reuse. Fsyncer cleanup moved from `posix.c` to `posix-common.c`. Bug preserved unchanged.
2019-04	Raghavendra Bhat	`b922793588`	Fixed the exact same bug class in bit-rot-stub (BZ 1700078). Signer thread cancelled via `gf_thread_cleanup_xint` while holding mutex → mutex never released → deadlock on re-enable. Fix: `pthread_cleanup_push`. The posix fsyncer has the same pattern but was not addressed.
2020-10	Ravishankar N	`4b9f95b699`	Added io_uring support with `uring_thread_exit` — the correct cooperative exit flag pattern, in the same xlator. The fsyncer was not updated.
2021-10	Dmitry Antipov	`0af464c492`	Fixed thread leak in posix health_check (PR #2851). Touched the same file but only the health_check thread, not the fsyncer.

Issue	Relationship
#4684	Sibling — `posix_fini` disk_space_check UAF and double-decrement (same investigation)
#4666	Same theme — glfs_fini/open race, shutdown path missing synchronization gate
#4159	Related symptom — io_uring SIGABRT; io_uring has the correct exit flag but its fini is never called
#3727	Brick mux restart crash — locks xlator `priv=NULL` during teardown (closed wontfix)
#1982	Brick mux test crash — multiplex-limit-issue-151.t intermittent corefile (closed wontfix)

Uh oh!

posix: fsyncer pthread_cancel causes UB (locked mutex destroy) and stub leak #4685

Description

Summary

Affected code

Root cause

1. Mutex destroyed while locked (UB)

2. Stub leak

Reproduction

Impact

Suggested fix

Verification

Counterexample: H1 — mutex destroyed while locked

Counterexample: H2 — stub leak after cancel

Provenance

Prior and related issues

Related: other gf_thread_cleanup_xint call sites

Found by

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Related: other `gf_thread_cleanup_xint` call sites