Summary
rpc_clnt_disable(), rpc_clnt_connection_cleanup(), and rpc_clnt_reconnect_cleanup() unconditionally unref the rpc_clnt after calling gf_timer_call_cancel() on the call_bail and reconnect timers, regardless of whether the cancel succeeded. When the timer callback has already been dispatched (cancel returns -1), both the caller and the callback unref the same timer ref -- producing a double-unref that can drive the refcount to zero prematurely.
This is a regression introduced in PR #4329 (commit ba5bfeb119a, 2024-10-21).
Affected code
Five call sites across three functions ignore the return value of gf_timer_call_cancel():
| File |
Line |
Timer |
Function |
rpc/rpc-lib/src/rpc-clnt.c |
1813 |
conn->timer (call_bail) |
rpc_clnt_disable |
rpc/rpc-lib/src/rpc-clnt.c |
1823 |
conn->reconnect |
rpc_clnt_disable |
rpc/rpc-lib/src/rpc-clnt.c |
485 |
conn->timer (call_bail) |
rpc_clnt_connection_cleanup |
rpc/rpc-lib/src/rpc-clnt.c |
490 |
conn->reconnect |
rpc_clnt_connection_cleanup |
rpc/rpc-lib/src/rpc-clnt.c |
443 |
conn->reconnect |
rpc_clnt_reconnect_cleanup |
A sixth call site handles the return value correctly and serves as the reference pattern:
/* rpc-clnt.c:364-366 -- rpc_clnt_reconnect (correct) */
if (conn->reconnect) {
if (!gf_timer_call_cancel(clnt->ctx, conn->reconnect))
canceled_unref = _gf_true;
}
Root cause
gf_timer_call_cancel() (timer.c:72) returns:
- 0 -- timer removed before dispatch. Callback will NOT run. Caller owns the ref.
- -1 -- timer already dispatched. Callback is running or about to run. Callback owns the ref.
The call_bail and reconnect callbacks (call_bail at line 190, rpc_clnt_reconnect at line 391) always unref independently at the end of their execution. The cancel return value determines whether the canceller should also unref. Ignoring it produces a double-unref.
The comment at line 1814 documents the correct behavior:
"If the event is not fired and it actually cancelled the timer, do the unref else registered call back function will take care of it."
But the code on the next line sets the flag unconditionally.
Race sequence
Step Thread Action Source
---- ------ ------ ------
1 disable_thread acquire conn->lock rpc-clnt.c:1808
2 disable_thread disabled = 1 rpc-clnt.c:1810
--- timer wheel fires between steps 2 and 3 ---
3 timer_wheel event->fired = true timer.c:142
4 disable_thread gf_timer_call_cancel() -> -1 rpc-clnt.c:1823
5 disable_thread reconnect_unref = TRUE rpc-clnt.c:1824
BUG: ignores return value
6 disable_thread release conn->lock rpc-clnt.c:1832
7 disable_thread rpc_clnt_unref(rpc) rpc-clnt.c:1858
refcount: 2 -> 1 (UNREF #1)
--- callback acquires the released lock ---
8 reconnect_cbk acquire conn->lock rpc-clnt.c:358
9 reconnect_cbk sees disabled==1, breaks chain rpc-clnt.c:370
10 reconnect_cbk release conn->lock rpc-clnt.c:389
11 reconnect_cbk rpc_clnt_unref(clnt) rpc-clnt.c:391
refcount: 1 -> 0 (UNREF #2)
12 OVER-UNREF: 2 unrefs for 1 ref
This interleaving is formally proven reachable by SPIN model checker (depth 14, 21 states). See the verification gist for the Promela models.
Two timer ownership models in rpc-clnt
PR #4329's rationale was:
"Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer."
This reasoning conflates two different ownership models:
call_bail / reconnect: cancel-gated ownership
The callback always unrefs independently (line 190 / line 391). The cancel return value determines whether the canceller should also unref. Ignoring it = double-unref.
ping: NULL-gate ownership
Both sides compete for conn->ping_timer != NULL via the same function (rpc_clnt_remove_ping_timer_locked). Whoever NULLs it first owns the unref. The cancel return value is irrelevant.
| Timer |
Ownership Model |
Cancel Return Matters? |
Fix Needed? |
| call_bail |
Cancel-gated |
Yes |
Yes |
| reconnect |
Cancel-gated |
Yes |
Yes |
| ping |
NULL-gated |
No |
No |
The detailed explanation with ASCII diagrams is in the verification gist (timer-ownership-models.md).
Impact
- Refcount underflow: the timer's ref is released twice. With unsigned atomics, the second unref wraps to
UINT_MAX rather than going negative -- the refcount is silently corrupted.
- Use-after-free (narrow window): if
rpc_clnt_trigger_destroy fires from disable's unref and progresses to rpc_clnt_destroy before the callback finishes, the callback accesses freed memory.
- All 12 consumers of
rpc_clnt are affected: every xlator that uses rpc_clnt_disable or rpc_clnt_connection_cleanup (protocol/client, glusterd, gfapi, CLI, quota, changelog, snapview-server, NLM) is exposed.
Suggested fix
Check the return value at each affected site. Only set the unref flag when cancel returns 0. This restores the behavior from before PR #4329 (originally introduced in cd6fc929a4, 2016):
/* Before (buggy): */
gf_timer_call_cancel(rpc->ctx, conn->reconnect);
reconnect_unref = _gf_true; /* unconditional */
/* After (fixed): */
if (!gf_timer_call_cancel(rpc->ctx, conn->reconnect))
reconnect_unref = _gf_true; /* only if cancel succeeded */
rpc_clnt_remove_ping_timer_locked() is deliberately not changed -- its NULL-gate ownership model is correct as-is.
Formal verification
Found and verified using SPIN model checker v6.5.2:
| Model |
States |
Errors |
What it proves |
| Bug model (single timer) |
21 |
1 (depth 14) |
Double-unref is reachable |
| Fix model (single timer) |
69 |
0 |
Fix eliminates the race |
| Fix model (all 3 timers, 8 interleavings) |
11,836 |
0 |
Fix is correct for all timer combinations |
Promela models and annotated counterexample: https://gist.github.com/ThalesBarretto/21d1975f252eafd9038f2659ed75485f
History
| Date |
Commit |
What |
| 2016-03-07 |
cd6fc929a4 |
Kotresh HR adds ref-per-timer with cancel return check (correct) |
| 2018-01-24 |
b585e5021f6 |
Zhang Huan adds cancel check in rpc_clnt_reconnect (correct, unaffected) |
| 2024-10-21 |
ba5bfeb119a |
PR #4329 removes return value checks (regression) |
Summary
rpc_clnt_disable(),rpc_clnt_connection_cleanup(), andrpc_clnt_reconnect_cleanup()unconditionally unref therpc_clntafter callinggf_timer_call_cancel()on the call_bail and reconnect timers, regardless of whether the cancel succeeded. When the timer callback has already been dispatched (cancel returns -1), both the caller and the callback unref the same timer ref -- producing a double-unref that can drive the refcount to zero prematurely.This is a regression introduced in PR #4329 (commit
ba5bfeb119a, 2024-10-21).Affected code
Five call sites across three functions ignore the return value of
gf_timer_call_cancel():rpc/rpc-lib/src/rpc-clnt.cconn->timer(call_bail)rpc_clnt_disablerpc/rpc-lib/src/rpc-clnt.cconn->reconnectrpc_clnt_disablerpc/rpc-lib/src/rpc-clnt.cconn->timer(call_bail)rpc_clnt_connection_cleanuprpc/rpc-lib/src/rpc-clnt.cconn->reconnectrpc_clnt_connection_cleanuprpc/rpc-lib/src/rpc-clnt.cconn->reconnectrpc_clnt_reconnect_cleanupA sixth call site handles the return value correctly and serves as the reference pattern:
Root cause
gf_timer_call_cancel()(timer.c:72) returns:The call_bail and reconnect callbacks (
call_bailat line 190,rpc_clnt_reconnectat line 391) always unref independently at the end of their execution. The cancel return value determines whether the canceller should also unref. Ignoring it produces a double-unref.The comment at line 1814 documents the correct behavior:
But the code on the next line sets the flag unconditionally.
Race sequence
This interleaving is formally proven reachable by SPIN model checker (depth 14, 21 states). See the verification gist for the Promela models.
Two timer ownership models in rpc-clnt
PR #4329's rationale was:
This reasoning conflates two different ownership models:
call_bail / reconnect: cancel-gated ownership
The callback always unrefs independently (line 190 / line 391). The cancel return value determines whether the canceller should also unref. Ignoring it = double-unref.
ping: NULL-gate ownership
Both sides compete for
conn->ping_timer != NULLvia the same function (rpc_clnt_remove_ping_timer_locked). Whoever NULLs it first owns the unref. The cancel return value is irrelevant.The detailed explanation with ASCII diagrams is in the verification gist (timer-ownership-models.md).
Impact
UINT_MAXrather than going negative -- the refcount is silently corrupted.rpc_clnt_trigger_destroyfires from disable's unref and progresses torpc_clnt_destroybefore the callback finishes, the callback accesses freed memory.rpc_clntare affected: every xlator that usesrpc_clnt_disableorrpc_clnt_connection_cleanup(protocol/client, glusterd, gfapi, CLI, quota, changelog, snapview-server, NLM) is exposed.Suggested fix
Check the return value at each affected site. Only set the unref flag when cancel returns 0. This restores the behavior from before PR #4329 (originally introduced in
cd6fc929a4, 2016):rpc_clnt_remove_ping_timer_locked()is deliberately not changed -- its NULL-gate ownership model is correct as-is.Formal verification
Found and verified using SPIN model checker v6.5.2:
Promela models and annotated counterexample: https://gist.github.com/ThalesBarretto/21d1975f252eafd9038f2659ed75485f
History
cd6fc929a4b585e5021f6rpc_clnt_reconnect(correct, unaffected)ba5bfeb119a