Replace fragile dict stored-key API with getKeyId callback#1
Conversation
| const void *key = dictStoredKey2Key(d, ptr); | ||
| link = dictFindLink(d, ptr, NULL); |
There was a problem hiding this comment.
🔴 Bug: dictStoredKey2Key is declared static in dict.c:97, so calling it from defrag.c:280 will cause a linker error (undefined reference). Additionally, the extracted key variable is never used -- ptr (the stored key) is passed to dictFindLink on line 281 instead of key. The old code used the dictUseStoredKeyApi macro which worked cross-TU. Fix: export dictStoredKey2Key (or add a public wrapper in dict.h), and change line 281 to link = dictFindLink(d, key, NULL);.
Extended reasoning...
What the bug is
There are two related issues in activeDefragHfieldAndUpdateRef in defrag.c at lines 280-281:
const void *key = dictStoredKey2Key(d, ptr);
link = dictFindLink(d, ptr, NULL);Issue 1: Static function called across translation units
dictStoredKey2Key is declared static in dict.c at line 97:
static const void *dictStoredKey2Key(dict *d, const void *key __stored_key) {
return (d->type->keyFromStoredKey) ? d->type->keyFromStoredKey(key) : key;
}Since static functions in C have internal linkage, they are only visible within the translation unit (dict.c) where they are defined. There is no declaration of dictStoredKey2Key in dict.h, server.h, or any other header file. When defrag.c calls this function, the compiler will either produce an implicit function declaration error (with -Werror-implicit-function-declaration, which Redis uses) or the linker will fail with an undefined reference.
The old code used dictUseStoredKeyApi(d, 1) which was a macro operating on a public struct field (d->useStoredKeyApi), accessible from any file. The PR replaced this macro-based approach with a static helper function but forgot to export it.
Issue 2: Extracted key is unused
Even if the linkage issue were resolved, the extracted key (the lookup key derived from the stored key) is assigned on line 280 but never actually used. Line 281 passes ptr (the original stored-key pointer) to dictFindLink instead of key. This means:
- The variable
keywill trigger a-Wunused-variablewarning, which under-Werror(as specified in AGENTS.md rule 4) will cause a build failure. - The code is semantically wrong. The purpose of
dictStoredKey2Keyis to extract the lookup key from the stored key, anddictFindLinkexpects a lookup key. Passing the stored key (ptr) instead of the extracted key (key) is a logic error.
Step-by-step proof
- The compiler processes defrag.c as a separate translation unit from dict.c.
- defrag.c:280 calls
dictStoredKey2Key(d, ptr)-- but this function isstaticin dict.c and has no declaration in any header. - With
-Werror-implicit-function-declaration(standard Redis build flags), this is a compilation error. - Even without that flag, the linker would fail because
dictStoredKey2Keyhas internal linkage and produces no external symbol. - If we somehow bypassed the linkage issue:
keyis assigned butptris passed todictFindLinkon line 281, makingkeyunused (triggering-Wunused-variable/-Werror). - Currently this doesn't cause a runtime logic bug because the hash dict types (
entryHashDictTypeandentryHashDictTypeWithHFE) don't setkeyFromStoredKey, sodictStoredKey2Keywould returnptrunchanged. But it is semantically incorrect and would break ifkeyFromStoredKeywere ever set.
How to fix
- Either remove the
staticqualifier fromdictStoredKey2Keyin dict.c and add a declaration in dict.h, or create a public wrapper function. - Change line 281 from
link = dictFindLink(d, ptr, NULL);tolink = dictFindLink(d, key, NULL);.
Compare with the correct pattern used elsewhere in this PR, e.g., rehashEntriesInBucketAtIndex in dict.c:342 which correctly extracts the key before using it for hashing.
| static int validateStoredKeyConversion(dict *d, const void *key __stored_key) { | ||
| const void *extracted = dictStoredKey2Key(d, key); | ||
| if (d->type->keyFromStoredKey) { | ||
| return extracted != NULL; | ||
| } | ||
| return extracted == key; | ||
| } |
There was a problem hiding this comment.
🟡 validateStoredKeyConversion (lines 102-108) is a static function with zero callers anywhere in the codebase. With -Werror and -Wunused-function, this will cause a build failure. Either remove the function or wire it into actual validation call sites.
Extended reasoning...
What the bug is
The function validateStoredKeyConversion was introduced in this PR at src/dict.c:102-108 as a static helper to validate that dictStoredKey2Key correctly extracts a key from a stored-key object. However, it is never called from anywhere in the codebase.
How it manifests
Grepping the entire codebase for validateStoredKeyConversion returns exactly one match: the function definition itself at line 102. There are no callers in dict.c, nor in any other file. Since the function is declared static, it is scoped to the translation unit and invisible to other files, making it definitively dead code.
Why existing code does not prevent this
The C compiler does not error on unused static functions by default. However, Redis builds with -Wall (which includes -Wunused-function). When combined with -Werror, which promotes all warnings to errors, the unused static function will cause a hard compilation failure. The AGENTS.md rule 4 explicitly requires that the build compiles without warnings when -Werror is enabled.
Step-by-step proof
validateStoredKeyConversionis defined atdict.c:102with thestaticstorage class.- Running
grep -rn validateStoredKeyConversion src/returns only the definition at line 102. - No other function in
dict.cor any other file referencesvalidateStoredKeyConversion. - Compiling with
gcc -Wall -Werrorwill trigger-Wunused-functionfor this static function, causing a build error.
Impact
This is low-impact dead code. It does not affect runtime behavior since it is never executed. The primary concern is the build failure under -Werror, which would block CI pipelines or strict build configurations.
Fix
The simplest fix is to remove the function entirely (lines 102-108 and the preceding comment on line 101). Alternatively, if the validation is intended to be used, it should be wired into call sites such as dictAddRaw, dictSetKeyAtLink, or debug/assertion paths where stored-key conversion occurs.
| if (*link == NULL) { | ||
| *link = dictFindLink(d, key, NULL); | ||
| *link = dictFindLink(d, addedKey, NULL); | ||
| assert(*link != NULL); |
There was a problem hiding this comment.
🟣 In dictSetKeyAtLink, the newItem=0 fallback (line 914) passes addedKey (a stored key) to dictFindLink(), which expects a lookup key. The newItem=1 path at line 902 correctly calls dictStoredKey2Key(d, key) before searching. This is a pre-existing inconsistency (the old code had the same issue without dictUseStoredKeyApi wrapping), and currently unreachable since all callers provide a non-NULL *link. Fix: *link = dictFindLink(d, dictStoredKey2Key(d, addedKey), NULL);
Extended reasoning...
Bug Analysis
In dictSetKeyAtLink() at line 914, when newItem=0 and *link == NULL, the code calls dictFindLink(d, addedKey, NULL). Here, addedKey is a stored key (e.g., a kvobj pointer for dbDictType). However, dictFindLink() expects a lookup key (e.g., an sds string), because internally it calls dictGetHash(d, key) which invokes d->type->hashFunction(key) — for dbDictType, this is dictSdsHash, which calls sdslen() on the pointer.
The Asymmetry
The newItem=1 path at line 902 correctly converts the stored key before searching:
*link = dictFindLinkInternal(d, dictStoredKey2Key(d, key), &bucket);But the newItem=0 path at line 914 does not:
*link = dictFindLink(d, addedKey, NULL); // addedKey is a stored key!The fix should be:
*link = dictFindLink(d, dictStoredKey2Key(d, addedKey), NULL);Reachability
All current callers of dictSetKeyAtLink with newItem=0 provide a valid non-NULL *link obtained from a prior dictFindLink or dictTwoPhaseUnlinkFind call. The callers include activeDefragHfieldDictCallback in defrag.c, activeDefragHfieldAndUpdateRef in defrag.c, and defragKey in defrag.c (via kvstoreDictSetAtLink). Each of these asserts the link is non-NULL. Therefore, the if (*link == NULL) branch at line 913 is never executed in practice.
Pre-existing Nature
The old code had the same asymmetry: the newItem=1 path used dictUseStoredKeyApi(d, 1) before calling dictFindLinkInternal, while the newItem=0 path called dictFindLink(d, key, NULL) without setting useStoredKeyApi. This PR changed key to addedKey (which is identical when keyDup is NULL, as is the case for dbDictType) but preserved the pre-existing inconsistency.
Impact
Since this code path is currently unreachable, there is no runtime impact. However, the function API documents that *link can be NULL and the function will locate the key itself. If any future caller relies on this behavior with a dict that has keyFromStoredKey set (like dbDictType), it would cause memory corruption by calling sdslen() on a non-sds pointer. The fix is trivial and would make the code consistent.
Benchmark PR from agentic-review-benchmarks#1