Skip to content

rpc: refcount over-decrement when timer cancel races with callback dispatch #4679

Description

@ThalesBarretto

Summary

rpc_clnt_disable(), rpc_clnt_connection_cleanup(), and rpc_clnt_reconnect_cleanup() unconditionally unref the rpc_clnt after calling gf_timer_call_cancel() on the call_bail and reconnect timers, regardless of whether the cancel succeeded. When the timer callback has already been dispatched (cancel returns -1), both the caller and the callback unref the same timer ref -- producing a double-unref that can drive the refcount to zero prematurely.

This is a regression introduced in PR #4329 (commit ba5bfeb119a, 2024-10-21).

Affected code

Five call sites across three functions ignore the return value of gf_timer_call_cancel():

File Line Timer Function
rpc/rpc-lib/src/rpc-clnt.c 1813 conn->timer (call_bail) rpc_clnt_disable
rpc/rpc-lib/src/rpc-clnt.c 1823 conn->reconnect rpc_clnt_disable
rpc/rpc-lib/src/rpc-clnt.c 485 conn->timer (call_bail) rpc_clnt_connection_cleanup
rpc/rpc-lib/src/rpc-clnt.c 490 conn->reconnect rpc_clnt_connection_cleanup
rpc/rpc-lib/src/rpc-clnt.c 443 conn->reconnect rpc_clnt_reconnect_cleanup

A sixth call site handles the return value correctly and serves as the reference pattern:

/* rpc-clnt.c:364-366 -- rpc_clnt_reconnect (correct) */
if (conn->reconnect) {
    if (!gf_timer_call_cancel(clnt->ctx, conn->reconnect))
        canceled_unref = _gf_true;
}

Root cause

gf_timer_call_cancel() (timer.c:72) returns:

  • 0 -- timer removed before dispatch. Callback will NOT run. Caller owns the ref.
  • -1 -- timer already dispatched. Callback is running or about to run. Callback owns the ref.

The call_bail and reconnect callbacks (call_bail at line 190, rpc_clnt_reconnect at line 391) always unref independently at the end of their execution. The cancel return value determines whether the canceller should also unref. Ignoring it produces a double-unref.

The comment at line 1814 documents the correct behavior:

"If the event is not fired and it actually cancelled the timer, do the unref else registered call back function will take care of it."

But the code on the next line sets the flag unconditionally.

Race sequence

Step  Thread              Action                            Source
----  ------              ------                            ------
 1    disable_thread      acquire conn->lock                rpc-clnt.c:1808
 2    disable_thread      disabled = 1                      rpc-clnt.c:1810

      --- timer wheel fires between steps 2 and 3 ---
 3    timer_wheel         event->fired = true               timer.c:142

 4    disable_thread      gf_timer_call_cancel() -> -1      rpc-clnt.c:1823
 5    disable_thread      reconnect_unref = TRUE            rpc-clnt.c:1824
                          BUG: ignores return value
 6    disable_thread      release conn->lock                rpc-clnt.c:1832
 7    disable_thread      rpc_clnt_unref(rpc)               rpc-clnt.c:1858
                          refcount: 2 -> 1  (UNREF #1)

      --- callback acquires the released lock ---
 8    reconnect_cbk       acquire conn->lock                rpc-clnt.c:358
 9    reconnect_cbk       sees disabled==1, breaks chain    rpc-clnt.c:370
10    reconnect_cbk       release conn->lock                rpc-clnt.c:389
11    reconnect_cbk       rpc_clnt_unref(clnt)              rpc-clnt.c:391
                          refcount: 1 -> 0  (UNREF #2)

12    OVER-UNREF: 2 unrefs for 1 ref

This interleaving is formally proven reachable by SPIN model checker (depth 14, 21 states). See the verification gist for the Promela models.

Two timer ownership models in rpc-clnt

PR #4329's rationale was:

"Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer."

This reasoning conflates two different ownership models:

call_bail / reconnect: cancel-gated ownership

The callback always unrefs independently (line 190 / line 391). The cancel return value determines whether the canceller should also unref. Ignoring it = double-unref.

ping: NULL-gate ownership

Both sides compete for conn->ping_timer != NULL via the same function (rpc_clnt_remove_ping_timer_locked). Whoever NULLs it first owns the unref. The cancel return value is irrelevant.

Timer Ownership Model Cancel Return Matters? Fix Needed?
call_bail Cancel-gated Yes Yes
reconnect Cancel-gated Yes Yes
ping NULL-gated No No

The detailed explanation with ASCII diagrams is in the verification gist (timer-ownership-models.md).

Impact

  • Refcount underflow: the timer's ref is released twice. With unsigned atomics, the second unref wraps to UINT_MAX rather than going negative -- the refcount is silently corrupted.
  • Use-after-free (narrow window): if rpc_clnt_trigger_destroy fires from disable's unref and progresses to rpc_clnt_destroy before the callback finishes, the callback accesses freed memory.
  • All 12 consumers of rpc_clnt are affected: every xlator that uses rpc_clnt_disable or rpc_clnt_connection_cleanup (protocol/client, glusterd, gfapi, CLI, quota, changelog, snapview-server, NLM) is exposed.

Suggested fix

Check the return value at each affected site. Only set the unref flag when cancel returns 0. This restores the behavior from before PR #4329 (originally introduced in cd6fc929a4, 2016):

/* Before (buggy): */
gf_timer_call_cancel(rpc->ctx, conn->reconnect);
reconnect_unref = _gf_true;   /* unconditional */

/* After (fixed): */
if (!gf_timer_call_cancel(rpc->ctx, conn->reconnect))
    reconnect_unref = _gf_true;  /* only if cancel succeeded */

rpc_clnt_remove_ping_timer_locked() is deliberately not changed -- its NULL-gate ownership model is correct as-is.

Formal verification

Found and verified using SPIN model checker v6.5.2:

Model States Errors What it proves
Bug model (single timer) 21 1 (depth 14) Double-unref is reachable
Fix model (single timer) 69 0 Fix eliminates the race
Fix model (all 3 timers, 8 interleavings) 11,836 0 Fix is correct for all timer combinations

Promela models and annotated counterexample: https://gist.github.com/ThalesBarretto/21d1975f252eafd9038f2659ed75485f

History

Date Commit What
2016-03-07 cd6fc929a4 Kotresh HR adds ref-per-timer with cancel return check (correct)
2018-01-24 b585e5021f6 Zhang Huan adds cancel check in rpc_clnt_reconnect (correct, unaffected)
2024-10-21 ba5bfeb119a PR #4329 removes return value checks (regression)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions