Fix HINCRBYFLOAT removes field expiration on replica#8
Conversation
| robj *argv[5]; | ||
| argv[0] = shared.hpexpireat; | ||
| argv[1] = c->argv[1]; | ||
| argv[2] = createStringObjectFromLongLong(expireat); | ||
| argv[3] = shared.fields; | ||
| argv[4] = shared.integers[1]; | ||
| argv[5] = c->argv[2]; | ||
| alsoPropagate(c->db->id, argv, 6, PROPAGATE_AOF|PROPAGATE_REPL); |
There was a problem hiding this comment.
🔴 Stack buffer overflow and memory leak in HPEXPIREAT propagation block (lines 2306-2313). robj *argv[5] allocates indices 0-4, but argv[5] = c->argv[2] writes out of bounds — should be robj *argv[6]. Additionally, the object created by createStringObjectFromLongLong(expireat) is never freed with decrRefCount(argv[2]) after alsoPropagate, leaking memory on every HINCRBYFLOAT with a field TTL.
Extended reasoning...
Buffer overflow
At line 2306, the array is declared as robj *argv[5], which allocates 5 elements with valid indices 0 through 4. However, line 2312 writes argv[5] = c->argv[2], which is a classic off-by-one stack buffer overflow — index 5 is out of bounds. The alsoPropagate call on line 2313 passes argc=6, confirming that 6 elements are intended. The fix is to change the declaration to robj *argv[6].
This corrupts the stack on every HINCRBYFLOAT command applied to a hash field that has an expiration time set. Depending on the compiler and platform, this can cause crashes, silent memory corruption, or unpredictable behavior.
Memory leak
At line 2309, argv[2] = createStringObjectFromLongLong(expireat) creates a new robj with refcount=1. When alsoPropagate() is called (server.c:3330-3332), it copies the argv array and calls incrRefCount on each element, bumping argv[2]'s refcount to 2. When the propagated copy is eventually freed by redisOpArrayFree, the refcount drops back to 1 — but nobody ever calls decrRefCount(argv[2]) to bring it to 0 and free the object.
The correct pattern is demonstrated in propagateHashFieldDeletion() at lines 2963-2964 of the same file, which explicitly calls decrRefCount(argv[1]) and decrRefCount(argv[2]) after alsoPropagate. The fix here is to add decrRefCount(argv[2]) after the alsoPropagate call.
Note that even if shouldPropagate() returns false (causing alsoPropagate to return early without calling incrRefCount), the object created by createStringObjectFromLongLong still has refcount=1 and is never freed, so the leak occurs regardless.
Step-by-step proof
- A user runs
HSET h1 f1 10followed byHEXPIRE h1 100 FIELDS 1 f1to set a field with TTL. - The user then runs
HINCRBYFLOAT h1 f1 1.5. hincrbyfloatCommandcallshashTypeGetValuewhich returnsexpireatwith a valid timestamp, sohas_expirationis set to 1.- Execution enters the
if (has_expiration)block at line 2297. robj *argv[5]is allocated on the stack (5 elements, indices 0-4).argv[0]throughargv[4]are assigned normally.argv[5] = c->argv[2]writes past the end of the array, corrupting the stack (undefined behavior).alsoPropagate(c->db->id, argv, 6, ...)reads 6 elements from an array of size 5, reading corrupted/garbage data for the 6th element.- Meanwhile,
argv[2](thecreateStringObjectFromLongLongresult) is never freed, leaking memory. - This happens on every
HINCRBYFLOATtargeting a field with a TTL.
Impact
The buffer overflow is critical — it causes undefined behavior on every HINCRBYFLOAT on a field with TTL, which can lead to crashes or silent data corruption. The memory leak compounds over time, causing increasing memory usage that is never reclaimed. Both bugs are introduced by this PR.
| } | ||
|
|
||
| if (expiredAt >= (uint64_t) commandTimeSnapshot()) | ||
| if (*expiredAt > (uint64_t) commandTimeSnapshot()) |
There was a problem hiding this comment.
🟡 Off-by-one in expiration comparison: the operator was changed from >= to > at line 742, which means expiredAt == commandTimeSnapshot() now falls through as expired instead of returning GETF_OK. Every other expiration check in the codebase (hashTypeIsExpired, keyIsExpired, hash iterator, HTTL, HPERSIST) uses strict less-than for the expired condition, treating the exact-millisecond boundary as NOT expired. The >= should be restored alongside the pointer dereference fix.
Extended reasoning...
What the bug is
In hashTypeGetValue at src/t_hash.c:742, the PR changes the expiration check from:
if (expiredAt >= (uint64_t) commandTimeSnapshot())
return GETF_OK;to:
if (*expiredAt > (uint64_t) commandTimeSnapshot())
return GETF_OK;The pointer dereference change (expiredAt → *expiredAt) is correct and necessary because the parameter was changed from a local uint64_t to a uint64_t * output parameter. However, the comparison operator was also changed from >= to >, which alters the semantics at the exact-millisecond boundary.
How it manifests
With the original >=: when expiredAt == commandTimeSnapshot(), the condition is true, GETF_OK is returned, and the field is considered not expired.
With the new >: when *expiredAt == commandTimeSnapshot(), the condition is false, execution falls through to the lazy-deletion path, and the field is considered expired.
Inconsistency with the rest of the codebase
Every other expiration check uses strict less-than (<) for the "is expired" condition, meaning expireAt == now is treated as NOT expired:
hashTypeIsExpired(line 574):return (mstime_t) expireAt < commandTimeSnapshot();keyIsExpired(db.c:1941):return now > when;- Hash iterator (line 1424):
(mstime_t)hi->expire_time < commandTimeSnapshot() - HTTL HT-path (line 3104):
(long long) expire < commandTimeSnapshot() - HPERSIST (line 3426):
(long long) expire < commandTimeSnapshot()
All of these consistently say: at the exact millisecond boundary, the field/key is still valid.
Step-by-step proof
- A hash field
f1hasexpiredAt = 1000(milliseconds). - At time T=1000,
commandTimeSnapshot()returns1000. - Via
hashTypeGetValue(new code):*expiredAt > 1000→1000 > 1000→ false → field falls through to lazy deletion → field is expired. - Via
hashTypeIsExpired:(mstime_t)1000 < 1000→ false → field is not expired. - The same field at the same instant is expired or not depending on the code path —
HGETwould delete it, butHSCAN(which uses the iterator callinghashTypeIsExpired) would still return it.
Impact
The practical impact is limited to a 1ms race window, making it very unlikely to trigger in production. However, it is a genuine semantic inconsistency introduced by this PR that could cause subtly different behavior depending on which code path handles the field.
Fix
Restore the >= operator while keeping the pointer dereference:
if (*expiredAt >= (uint64_t) commandTimeSnapshot())
return GETF_OK;| unsigned int vlen; | ||
| int has_expiration = 0; | ||
| uint64_t expireat = EB_EXPIRE_TIME_INVALID; | ||
| int unused_flag = 0; |
There was a problem hiding this comment.
🟡 The variable int unused_flag = 0 at line 2244 is declared but never used anywhere in hincrbyfloatCommand. This is dead code introduced by this PR that should be removed to keep the code clean and avoid potential compiler warnings.
Extended reasoning...
What the bug is
In hincrbyfloatCommand at src/t_hash.c:2244, the variable int unused_flag = 0 is declared but never referenced anywhere in the function. A grep of the entire file confirms the identifier unused_flag appears only on that single declaration line.
How it was introduced
This PR adds expiration-preservation logic to hincrbyfloatCommand. Two other new variables in the same block (has_expiration and expireat) are actively used, but unused_flag appears to be leftover scaffolding from development that was never cleaned up before the PR was submitted.
Impact
This is a minor code quality issue. The variable has no effect on runtime behavior or correctness. However, compiling with -Wall (which the Redis Makefile enables) will produce an -Wunused-variable warning. Depending on the CI configuration, if -Werror is also active (some Redis CI workflows do enable it), this could cause a build failure.
Step-by-step confirmation
- Look at the diff for
hincrbyfloatCommand— three new local variables are added at lines 2242-2244:has_expiration,expireat, andunused_flag. has_expirationis set on line 2263 (if (expireat != EB_EXPIRE_TIME_INVALID) has_expiration = 1;) and checked on line 2297 (if (has_expiration)).expireatis passed tohashTypeGetValueon line 2253 and used in the propagation block starting at line 2297.unused_flagis initialized to 0 on line 2244 and never read, written to, or passed to any function afterward.- Grepping the entire file for
unused_flagreturns exactly one match — the declaration itself.
Fix
Simply delete line 2244 (int unused_flag = 0;).
Benchmark PR from agentic-review-benchmarks#8