Fix MEMORY USAGE command#5
Fix MEMORY USAGE command#5tomerqodo wants to merge 2 commits intoclaude_claude_vs_qodo_base_fix_memory_usage_command_pr5from
Conversation
| } while ((node = node->next) && samples < sample_size); | ||
| asize += (double)elesize/elecount*ql->count; | ||
| asize += (double)elesize/samples*ql->count; |
There was a problem hiding this comment.
🔴 Bug: The quicklist memory estimation at line 1239 has a unit mismatch. elesize/samples computes the average memory per node, but it is then multiplied by ql->count (total entries across all nodes), not ql->len (total nodes). This causes MEMORY USAGE to overestimate quicklist memory by a factor of N, where N is the average number of entries per node. The multiplier should be ql->len instead of ql->count, or the divisor should be changed back to elecount.
Extended reasoning...
What the bug is
In kvobjComputeSize for OBJ_ENCODING_QUICKLIST (line 1239), the formula (double)elesize/samples*ql->count has a dimensional/unit mismatch. The sampling loop accumulates elesize (total bytes of sampled nodes) and samples (number of nodes sampled). Dividing elesize by samples yields the average memory per quicklist node. However, this is then multiplied by ql->count, which per quicklist.h:110 is the "total count of all entries in all listpacks" — i.e., the total number of entries, not nodes.
The specific code path
The sampling loop at lines 1234-1238 iterates over quicklist nodes:
do {
elesize += sizeof(quicklistNode)+zmalloc_size(node->entry);
elecount += node->count; // accumulates entries (unused in new formula)
samples++; // counts nodes sampled
} while ((node = node->next) && samples < sample_size);
asize += (double)elesize/samples*ql->count;The old code used elesize/elecount*ql->count, which correctly computed (avg bytes per entry) * (total entries). The PR changed the divisor from elecount to samples but forgot to change the multiplier from ql->count (total entries) to ql->len (total nodes).
Why existing code doesn't prevent it
The variable elecount is still accumulated in the loop but is now unused in the final calculation. There is no assertion or type system that catches the unit mismatch between per-node average and total entries.
Concrete proof with a step-by-step example
Consider a quicklist with 100 nodes (ql->len = 100), each containing 10 entries (ql->count = 1000). With sample_size = 5, we sample 5 nodes. Suppose each sampled node is 1000 bytes total (sizeof(quicklistNode) + zmalloc_size(node->entry)).
elesize = 5000(5 nodes x 1000 bytes)samples = 5(5 nodes sampled)elecount = 50(5 nodes x 10 entries each)ql->count = 1000(total entries)ql->len = 100(total nodes)
Old formula: 5000/50 * 1000 = 100 * 1000 = 100,000 bytes (correct: avg 100 bytes/entry x 1000 entries)
New formula: 5000/5 * 1000 = 1000 * 1000 = 1,000,000 bytes (wrong: 10x overestimate)
Correct fix: 5000/5 * 100 = 1000 * 100 = 100,000 bytes (using ql->len instead of ql->count)
The overestimation factor equals the average number of entries per node.
Impact
MEMORY USAGE will massively overreport memory for any list using quicklist encoding. The overestimate grows with the number of entries per quicklist node, which is controlled by list-max-listpack-size (default allows up to ~8KB per node, typically holding many entries). This affects monitoring, alerting, and memory management decisions.
How to fix
Change line 1239 from:
asize += (double)elesize/samples*ql->count;to either:
asize += (double)elesize/samples*ql->len;or revert to the original:
asize += (double)elesize/elecount*ql->count;Note on duplicate report
One verifier flagged this as a duplicate of bug_001. While both bugs relate to the quicklist section of kvobjComputeSize, this report specifically identifies the unit mismatch between the divisor (samples = number of nodes) and the multiplier (ql->count = total entries). If bug_001 covers the same issue, this confirms the finding with independent analysis.
| asize += lpsize; | ||
| } else { | ||
| if (samples) lpsize /= samples; /* Compute the average. */ | ||
| asize += lpsize * (s->rax->numele-1); | ||
| asize += lpsize * s->rax->numele; | ||
| /* No need to check if seek succeeded, we enter this branch only | ||
| * if there are a few elements in the radix tree. */ | ||
| raxSeek(&ri,"$",NULL,0); |
There was a problem hiding this comment.
🔴 Bug: Stream listpack memory estimation double-counts the last node. The PR changed lpsize * (s->rax->numele-1) to lpsize * s->rax->numele on line 1330, but line 1336 still adds the last node's actual size via zmalloc_size(ri.data). The old code used numele-1 specifically to exclude the last node from the average-based estimate before adding its real size separately. Fix: revert to lpsize * (s->rax->numele - 1).
Extended reasoning...
What the bug is
In kvobjComputeSize for OBJ_STREAM, when the number of radix tree elements exceeds the sample size, the code estimates the total listpack memory by computing an average listpack size from the first N samples, then multiplying by the number of nodes. The PR changed line 1330 from asize += lpsize * (s->rax->numele-1) to asize += lpsize * s->rax->numele.
However, lines 1333-1336 still seek to the last node ("$") and add its actual size via zmalloc_size(ri.data). This means the last node is counted twice: once as part of the lpsize * s->rax->numele multiplication (which includes ALL nodes), and once more via the explicit zmalloc_size(ri.data) addition.
The algorithm and why the old code was correct
The comment at lines 1311-1314 explicitly describes the intended algorithm: "we estimate the size of the first N listpacks, and use the average to compute the size of the first N-1 listpacks, and finally add the real size of the last node." The last listpack is often not full (it is the one being actively appended to), so its size may differ significantly from the average. The original code correctly:
- Computed the average listpack size from N samples
- Multiplied by
(numele - 1)to estimate all nodes except the last - Added the actual size of the last node via
zmalloc_size
The PR broke step 2 by using numele instead of (numele - 1), causing the last node to be counted in both steps 2 and 3.
Step-by-step proof
Consider a stream with 100 listpack nodes where each node is approximately 4KB. Suppose 5 samples are taken (the default), yielding an average lpsize of 4096 bytes.
Old code: asize += 4096 * (100 - 1) = 405,504 bytes, then asize += zmalloc_size(last_node) (say 4096) = 409,600 total.
New code: asize += 4096 * 100 = 409,600 bytes, then asize += zmalloc_size(last_node) (4096) = 413,696 total.
The new code overestimates by 4096 bytes (one average listpack size). For streams with larger listpacks, this overestimation grows proportionally.
Impact
The MEMORY USAGE command will report inflated memory for stream keys when the stream has more listpack nodes than the sample size (default 5). The overestimation equals approximately one average listpack size. While not a crash or data corruption bug, it produces incorrect results from a command specifically designed to report accurate memory usage.
Fix
Revert line 1330 to use the original expression: asize += lpsize * (s->rax->numele - 1);. The comment already documents the correct algorithm and was not updated by the PR, confirming this was an unintentional change.
| size_t elesize = 0, elecount = 0, samples = 0; | ||
|
|
||
| /* All kv-objects has at least kvobj header and embedded key */ | ||
| size_t asize = malloc_usable_size((void *)o); |
There was a problem hiding this comment.
🟡 Line 1217 uses malloc_usable_size((void *)o) directly instead of zmalloc_size((void *)o). This breaks portability: on macOS, malloc_usable_size does not exist (Apple provides malloc_size), and on non-jemalloc builds with PREFIX_SIZE > 0, the raw function receives an offset pointer causing undefined behavior. Every other .c file in the codebase uses zmalloc_size() or zmalloc_usable_size() for this purpose. Should be zmalloc_size((void *)o).
Extended reasoning...
What the bug is
In the newly introduced kvobjComputeSize() function at line 1217, the code initializes asize using malloc_usable_size((void *)o) — the raw libc/allocator function. However, Redis provides portable wrappers (zmalloc_size and zmalloc_usable_size) defined in zmalloc.h that abstract away platform differences. This is the only direct use of malloc_usable_size in any .c file in the entire codebase.
How it manifests
The zmalloc_size macro in zmalloc.h maps to different platform-specific functions depending on the build configuration:
- On jemalloc builds: maps to
je_malloc_usable_size - On macOS (Apple): maps to
malloc_sizefrom<malloc/malloc.h> - On glibc: maps to
malloc_usable_size
Using malloc_usable_size directly bypasses this abstraction. On macOS, malloc_usable_size simply does not exist — the Apple SDK provides malloc_size instead. This would cause a compile error on macOS builds without jemalloc.
Why existing code doesn't prevent it
The old objectComputeSize() function that this PR replaces used zmalloc_size((void *)o) for the EMBSTR case and sizeof(*o) for other cases. The refactored code consolidated all cases into a single asize initialization using the raw function, likely as an oversight during the refactor. There is no compile-time or lint check that would catch use of the raw allocator function vs. the zmalloc wrapper.
Concrete example
Consider a macOS build with MALLOC=libc:
HAVE_MALLOC_SIZEis defined, mapped tomalloc_size(Apple API)zmalloc_size()correctly callsmalloc_size()- But
malloc_usable_size()is not declared anywhere — the code at line 1217 fails to compile with an "implicit declaration" error
Or consider a Linux build with MALLOC=libc (no jemalloc):
HAVE_MALLOC_SIZEis NOT defined, sozmalloc_size()usesPREFIX_SIZElogiczmallocprepends a header before the returned pointerzmalloc_size(ptr)accounts for this by subtractingPREFIX_SIZEbefore calling the underlying function- Calling
malloc_usable_size(ptr)directly on the zmalloc-returned pointer passes an offset pointer to the allocator, producing incorrect results or undefined behavior
Impact
Since Redis bundles jemalloc and most production builds use it (where zmalloc_size == malloc_usable_size and PREFIX_SIZE == 0), this bug is unlikely to affect typical deployments. However, it breaks compilation on macOS with libc allocator and produces incorrect memory usage reporting on non-jemalloc Linux builds.
Fix
Change line 1217 from:
size_t asize = malloc_usable_size((void *)o);to:
size_t asize = zmalloc_size((void *)o);This matches how the old code worked and is consistent with every other allocation size query in this function and the codebase.
| stream *s = o->ptr; | ||
| asize = sizeof(*o)+sizeof(*s); | ||
| asize += sizeof(*s); | ||
| asize += streamRadixTreeMemoryUsage(s->rax); |
There was a problem hiding this comment.
🟣 Pre-existing bug: In streamRadixTreeMemoryUsage (line 1198), size = rax->numele * sizeof(streamID) uses = instead of +=, overwriting the sizeof(*rax) computed on line 1197. This loses ~48 bytes (on 64-bit) from the memory estimate per call. Since this function is called multiple times per stream (main rax, consumer group PELs, consumer PELs) by the PR-modified kvobjComputeSize, fixing it here would improve the accuracy of the MEMORY USAGE command this PR aims to correct.
Extended reasoning...
Bug Description
In streamRadixTreeMemoryUsage at src/object.c:1196-1203, line 1197 initializes size with sizeof(*rax) (~48 bytes on 64-bit systems), but line 1198 immediately overwrites it with size = rax->numele * sizeof(streamID) using assignment (=) instead of addition (+=). This makes line 1197 dead code and loses the rax struct size from the memory estimate.
Code Path
size_t size = sizeof(*rax); // line 1197: ~48 bytes
size = rax->numele * sizeof(streamID); // line 1198: OVERWRITES, should be +=
size += rax->numnodes * sizeof(raxNode); // line 1199: correctly uses +=
size += rax->numnodes * sizeof(long)*30; // line 1201: correctly uses +=The programmer clearly intended to accumulate sizes (as evidenced by the += on lines 1199 and 1201), but line 1198 uses = instead of +=.
Why Existing Code Doesn't Prevent It
There is no test that validates the exact memory accounting of streamRadixTreeMemoryUsage. The MEMORY USAGE command returns an approximation, so the missing ~48 bytes per call is unlikely to be caught by existing tests, especially since the other terms (node sizes, overhead) typically dominate.
Impact
The impact is relatively minor per call (~48 bytes on 64-bit), but streamRadixTreeMemoryUsage is called multiple times for each stream key by kvobjComputeSize: once for the main stream rax (line 1309), once for each consumer group PEL (line 1350), and once for each consumer PEL (line 1362). For streams with many consumer groups and consumers, the cumulative error grows.
Step-by-Step Proof
- A stream key exists with a rax tree where
rax->numele = 100andrax->numnodes = 50. streamRadixTreeMemoryUsageis called.- Line 1197:
size = sizeof(*rax)= 48 bytes. - Line 1198:
size = 100 * sizeof(streamID)= 100 * 16 = 1600. The 48 bytes from step 3 are lost. - With the fix (
size += ...), the result would be 1648 instead of 1600. - If this stream has 5 consumer groups each with their own PEL rax, the function is called 6+ times, losing 48 * 6 = 288 bytes from the estimate.
Fix
Change line 1198 from size = rax->numele * sizeof(streamID); to size += rax->numele * sizeof(streamID);.
Benchmark PR from agentic-review-benchmarks#5