Skip to content

Fix HINCRBYFLOAT removes field expiration on replica#8

Open
tomerqodo wants to merge 5 commits intoclaude_claude_vs_qodo_base_fix_hincrbyfloat_removes_field_expiration_on_replica_pr8from
claude_claude_vs_qodo_head_fix_hincrbyfloat_removes_field_expiration_on_replica_pr8
Open

Fix HINCRBYFLOAT removes field expiration on replica#8
tomerqodo wants to merge 5 commits intoclaude_claude_vs_qodo_base_fix_hincrbyfloat_removes_field_expiration_on_replica_pr8from
claude_claude_vs_qodo_head_fix_hincrbyfloat_removes_field_expiration_on_replica_pr8

Conversation

@tomerqodo
Copy link
Copy Markdown

Benchmark PR from agentic-review-benchmarks#8

Comment thread src/t_hash.c
Comment on lines +2306 to +2313
robj *argv[5];
argv[0] = shared.hpexpireat;
argv[1] = c->argv[1];
argv[2] = createStringObjectFromLongLong(expireat);
argv[3] = shared.fields;
argv[4] = shared.integers[1];
argv[5] = c->argv[2];
alsoPropagate(c->db->id, argv, 6, PROPAGATE_AOF|PROPAGATE_REPL);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Stack buffer overflow and memory leak in HPEXPIREAT propagation block (lines 2306-2313). robj *argv[5] allocates indices 0-4, but argv[5] = c->argv[2] writes out of bounds — should be robj *argv[6]. Additionally, the object created by createStringObjectFromLongLong(expireat) is never freed with decrRefCount(argv[2]) after alsoPropagate, leaking memory on every HINCRBYFLOAT with a field TTL.

Extended reasoning...

Buffer overflow

At line 2306, the array is declared as robj *argv[5], which allocates 5 elements with valid indices 0 through 4. However, line 2312 writes argv[5] = c->argv[2], which is a classic off-by-one stack buffer overflow — index 5 is out of bounds. The alsoPropagate call on line 2313 passes argc=6, confirming that 6 elements are intended. The fix is to change the declaration to robj *argv[6].

This corrupts the stack on every HINCRBYFLOAT command applied to a hash field that has an expiration time set. Depending on the compiler and platform, this can cause crashes, silent memory corruption, or unpredictable behavior.

Memory leak

At line 2309, argv[2] = createStringObjectFromLongLong(expireat) creates a new robj with refcount=1. When alsoPropagate() is called (server.c:3330-3332), it copies the argv array and calls incrRefCount on each element, bumping argv[2]'s refcount to 2. When the propagated copy is eventually freed by redisOpArrayFree, the refcount drops back to 1 — but nobody ever calls decrRefCount(argv[2]) to bring it to 0 and free the object.

The correct pattern is demonstrated in propagateHashFieldDeletion() at lines 2963-2964 of the same file, which explicitly calls decrRefCount(argv[1]) and decrRefCount(argv[2]) after alsoPropagate. The fix here is to add decrRefCount(argv[2]) after the alsoPropagate call.

Note that even if shouldPropagate() returns false (causing alsoPropagate to return early without calling incrRefCount), the object created by createStringObjectFromLongLong still has refcount=1 and is never freed, so the leak occurs regardless.

Step-by-step proof

  1. A user runs HSET h1 f1 10 followed by HEXPIRE h1 100 FIELDS 1 f1 to set a field with TTL.
  2. The user then runs HINCRBYFLOAT h1 f1 1.5.
  3. hincrbyfloatCommand calls hashTypeGetValue which returns expireat with a valid timestamp, so has_expiration is set to 1.
  4. Execution enters the if (has_expiration) block at line 2297.
  5. robj *argv[5] is allocated on the stack (5 elements, indices 0-4).
  6. argv[0] through argv[4] are assigned normally.
  7. argv[5] = c->argv[2] writes past the end of the array, corrupting the stack (undefined behavior).
  8. alsoPropagate(c->db->id, argv, 6, ...) reads 6 elements from an array of size 5, reading corrupted/garbage data for the 6th element.
  9. Meanwhile, argv[2] (the createStringObjectFromLongLong result) is never freed, leaking memory.
  10. This happens on every HINCRBYFLOAT targeting a field with a TTL.

Impact

The buffer overflow is critical — it causes undefined behavior on every HINCRBYFLOAT on a field with TTL, which can lead to crashes or silent data corruption. The memory leak compounds over time, causing increasing memory usage that is never reclaimed. Both bugs are introduced by this PR.

Comment thread src/t_hash.c
}

if (expiredAt >= (uint64_t) commandTimeSnapshot())
if (*expiredAt > (uint64_t) commandTimeSnapshot())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Off-by-one in expiration comparison: the operator was changed from >= to > at line 742, which means expiredAt == commandTimeSnapshot() now falls through as expired instead of returning GETF_OK. Every other expiration check in the codebase (hashTypeIsExpired, keyIsExpired, hash iterator, HTTL, HPERSIST) uses strict less-than for the expired condition, treating the exact-millisecond boundary as NOT expired. The >= should be restored alongside the pointer dereference fix.

Extended reasoning...

What the bug is

In hashTypeGetValue at src/t_hash.c:742, the PR changes the expiration check from:

if (expiredAt >= (uint64_t) commandTimeSnapshot())
    return GETF_OK;

to:

if (*expiredAt > (uint64_t) commandTimeSnapshot())
    return GETF_OK;

The pointer dereference change (expiredAt*expiredAt) is correct and necessary because the parameter was changed from a local uint64_t to a uint64_t * output parameter. However, the comparison operator was also changed from >= to >, which alters the semantics at the exact-millisecond boundary.

How it manifests

With the original >=: when expiredAt == commandTimeSnapshot(), the condition is true, GETF_OK is returned, and the field is considered not expired.

With the new >: when *expiredAt == commandTimeSnapshot(), the condition is false, execution falls through to the lazy-deletion path, and the field is considered expired.

Inconsistency with the rest of the codebase

Every other expiration check uses strict less-than (<) for the "is expired" condition, meaning expireAt == now is treated as NOT expired:

  • hashTypeIsExpired (line 574): return (mstime_t) expireAt < commandTimeSnapshot();
  • keyIsExpired (db.c:1941): return now > when;
  • Hash iterator (line 1424): (mstime_t)hi->expire_time < commandTimeSnapshot()
  • HTTL HT-path (line 3104): (long long) expire < commandTimeSnapshot()
  • HPERSIST (line 3426): (long long) expire < commandTimeSnapshot()

All of these consistently say: at the exact millisecond boundary, the field/key is still valid.

Step-by-step proof

  1. A hash field f1 has expiredAt = 1000 (milliseconds).
  2. At time T=1000, commandTimeSnapshot() returns 1000.
  3. Via hashTypeGetValue (new code): *expiredAt > 10001000 > 1000 → false → field falls through to lazy deletion → field is expired.
  4. Via hashTypeIsExpired: (mstime_t)1000 < 1000 → false → field is not expired.
  5. The same field at the same instant is expired or not depending on the code path — HGET would delete it, but HSCAN (which uses the iterator calling hashTypeIsExpired) would still return it.

Impact

The practical impact is limited to a 1ms race window, making it very unlikely to trigger in production. However, it is a genuine semantic inconsistency introduced by this PR that could cause subtly different behavior depending on which code path handles the field.

Fix

Restore the >= operator while keeping the pointer dereference:

if (*expiredAt >= (uint64_t) commandTimeSnapshot())
    return GETF_OK;

Comment thread src/t_hash.c
unsigned int vlen;
int has_expiration = 0;
uint64_t expireat = EB_EXPIRE_TIME_INVALID;
int unused_flag = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The variable int unused_flag = 0 at line 2244 is declared but never used anywhere in hincrbyfloatCommand. This is dead code introduced by this PR that should be removed to keep the code clean and avoid potential compiler warnings.

Extended reasoning...

What the bug is

In hincrbyfloatCommand at src/t_hash.c:2244, the variable int unused_flag = 0 is declared but never referenced anywhere in the function. A grep of the entire file confirms the identifier unused_flag appears only on that single declaration line.

How it was introduced

This PR adds expiration-preservation logic to hincrbyfloatCommand. Two other new variables in the same block (has_expiration and expireat) are actively used, but unused_flag appears to be leftover scaffolding from development that was never cleaned up before the PR was submitted.

Impact

This is a minor code quality issue. The variable has no effect on runtime behavior or correctness. However, compiling with -Wall (which the Redis Makefile enables) will produce an -Wunused-variable warning. Depending on the CI configuration, if -Werror is also active (some Redis CI workflows do enable it), this could cause a build failure.

Step-by-step confirmation

  1. Look at the diff for hincrbyfloatCommand — three new local variables are added at lines 2242-2244: has_expiration, expireat, and unused_flag.
  2. has_expiration is set on line 2263 (if (expireat != EB_EXPIRE_TIME_INVALID) has_expiration = 1;) and checked on line 2297 (if (has_expiration)).
  3. expireat is passed to hashTypeGetValue on line 2253 and used in the propagation block starting at line 2297.
  4. unused_flag is initialized to 0 on line 2244 and never read, written to, or passed to any function afterward.
  5. Grepping the entire file for unused_flag returns exactly one match — the declaration itself.

Fix

Simply delete line 2244 (int unused_flag = 0;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants