Refactor some of ASM and slot-stats functions by tomerqodo · Pull Request #26 · qodo-benchmark/redis-combined-coderabbit

tomerqodo · 2026-01-29T09:42:06Z

Benchmark PR from agentic-review-benchmarks#4

Summary by CodeRabbit

New Features
- Added cluster migration and slot synchronization commands for improved cluster management.
- Enhanced cluster initialization with improved data consistency verification and slot ownership handling.
Refactor
- Optimized cluster startup and maintenance sequences.
- Improved slot assignment and recovery mechanisms.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-29T09:42:29Z

Walkthrough

This PR refactors cluster initialization and lifecycle management by introducing clusterCommonInit() to centralize slot-state setup, relocating ASM lifecycle hooks from cluster_legacy.c to server.c's cron/before-sleep routines, and replacing verification logic with explicit unassigned slot claiming. Command routing for "migration" and "syncslots" is consolidated in cluster.c.

Changes

Cohort / File(s)	Summary
Cluster Common Initialization `src/cluster.c`, `src/cluster.h`, `src/cluster_legacy.c`, `src/server.c`	Introduced `clusterCommonInit()` to centralize shared cluster data and slot-state initialization. Removed legacy `resetClusterStats()` calls and relocated initialization flow from cluster_legacy.c to server.c, replacing direct `cluster_slot_stats` allocation with the new common init routine.
Unassigned Slot Management `src/cluster.c`, `src/cluster_legacy.c`	Replaced `verifyClusterConfigWithData()` with `clusterClaimUnassignedSlots()` for scanning and claiming slots with keys but no owner. Added `clusterDeleteKeysInUnownedSlots()` as an internal helper to purge keys in non-owned slots during initialization.
ASM Lifecycle Reorganization `src/cluster_legacy.c`, `src/server.c`	Removed `asmInit()`, `asmCron()`, and `asmBeforeSleep()` calls from cluster lifecycle functions. Reintegrated `asmCron()` and `asmBeforeSleep()` into server.c's main cron loop and before-sleep hook, changing invocation order to prioritize ASM tasks.
Cluster Command Handlers `src/cluster.c`, `src/cluster_legacy.c`	Added "migration" and "syncslots" subcommand routing in clusterCommand(). Removed legacy command-handling blocks for these subcommands from clusterCommandSpecial() in cluster_legacy.c, consolidating command dispatch.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Slots now claim themselves with pride,
Init flows reorganized far and wide,
ASM hops to server's eager hands,
Cluster harmony across the lands! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main refactoring work: moving ASM and slot-stats initialization logic from legacy cluster code to centralized functions, and restructuring cron/sleep hooks across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/cluster_legacy.c`:
- Around line 5259-5284: The clusterClaimUnassignedSlots function currently
calls clusterAddSlot(myself, i) and clusterSaveConfigOrDie but does not request
the usual state/broadcast updates; modify it so that after taking a slot you
mark the need to update state and notify peers by invoking
clusterDoBeforeSleep() with CLUSTER_TODO_UPDATE_STATE and
CLUSTER_TODO_BROADCAST_PONG (or accumulate those flags into update_config and
call clusterDoBeforeSleep(update_flags) after the loop), ensuring
clusterAddSlot, clusterDoBeforeSleep, CLUSTER_TODO_UPDATE_STATE, and
CLUSTER_TODO_BROADCAST_PONG are used exactly as in the analogous code paths so
peers are informed of the new slot ownership before saving config.

In `@src/cluster.c`:
- Around line 2137-2141: Replace the deprecated malloc call in clusterCommonInit
by allocating zeroed memory with zcalloc: allocate server.cluster_slot_stats
using zcalloc(CLUSTER_SLOTS, sizeof(clusterSlotStat)) instead of malloc(...),
keeping the subsequent resetClusterStats() and asmInit() calls unchanged; update
any necessary includes/headers if zcalloc isn’t already available in this
compilation unit.
- Around line 2171-2214: The function comment for verifyClusterConfigWithData is
inaccurate about handling keys for slots owned by other nodes: instead of
marking such slots IMPORTING, the implementation calls
clusterDeleteKeysInUnownedSlots() which removes those keys (safe with atomic
slot migration), then calls clusterClaimUnassignedSlots() to take truly
unassigned slots; update the comment to reflect that behavior change by
describing deletion via clusterDeleteKeysInUnownedSlots() and claiming via
clusterClaimUnassignedSlots() rather than setting IMPORTING.

🧹 Nitpick comments (1)

src/server.c (1)

1841-1844: Consider documenting the ASM vs. cluster before-sleep order.
Line 1841 now calls clusterBeforeSleep() then asmBeforeSleep(), while cron runs ASM first. If this asymmetry is intentional, a short comment would help prevent regressions; if not, align the order.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c52fabf and 865827b.

📒 Files selected for processing (4)

src/cluster.c
src/cluster.h
src/cluster_legacy.c
src/server.c

🧰 Additional context used

🧬 Code graph analysis (3)

src/cluster.h (1)

src/cluster.c (1)

clusterCommonInit (2137-2141)

src/cluster_legacy.c (1)

src/cluster.c (1)

countKeysInSlot (820-822)

src/server.c (2)

src/cluster_asm.c (2)

asmCron (2507-2576)

asmBeforeSleep (2463-2505)

src/cluster.c (1)

clusterCommonInit (2137-2141)

🪛 GitHub Actions: Codecov

src/cluster.c

[error] 2138-2138: malloc is deprecated. Function call triggers -Werror=deprecated-declarations; build failure during lcov step due to deprecated malloc usage in clusterCommonInit.

🪛 GitHub Actions: External Server Tests

src/cluster.h

[error] 1-1: Command failed: make -C src all (due to compilation error in cluster.o).

src/cluster_legacy.c

[error] 1-1: Command failed: make -C src all (due to compilation error in cluster.o).

src/cluster.c

[error] 2138-2138: ClusterCommonInit: malloc is deprecated (-Werror=deprecated-declarations). Use calloc or ensure proper usage to avoid deprecation warning treated as an error.

src/server.c

[error] 1-1: Command failed: make -C src all (due to compilation error in cluster.o).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build-old-chain-jemalloc
GitHub Check: Analyze (cpp)
GitHub Check: build-old-chain-jemalloc
GitHub Check: build-centos-jemalloc

🔇 Additional comments (5)

src/cluster.h (1)

89-92: LGTM!

The new function declarations for clusterCommonInit() and clusterClaimUnassignedSlots() follow the existing API pattern in the header and align with the implementations in src/cluster.c.

src/cluster.c (2)

2146-2169: LGTM!

The logic correctly handles deletion of keys in slots that are no longer owned by this node, with appropriate early return for slave nodes and proper skip conditions for owned/importing slots.

1092-1095: Both clusterMigrationCommand and clusterSyncSlotsCommand are properly declared in src/cluster_asm.h (lines 44-45) and implemented in src/cluster_asm.c, making them correctly visible to the routing code in this file.

src/server.c (2)

1657-1660: Verify ASM cron ordering against cluster state updates.
Line 1657 runs asmCron() before clusterCron(). If ASM cron logic depends on cluster state refreshed by clusterCron() (e.g., node health/slot ownership), this could act on stale data. Please confirm the ordering is intentional; otherwise swap or document the dependency.

7739-7741: Nice centralization of cluster init.
Line 7739 calls clusterCommonInit() before clusterInit(), which cleanly centralizes slot-stats/ASM setup prior to cluster startup. Looks good.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-29T09:49:05Z

+/* This function is called after the node startup in order to check if there
+ * are any slots that we have keys for, but are assigned to no one. If so,
+ * we take ownership of them. */
+void clusterClaimUnassignedSlots(void) {
+    if (nodeIsSlave(myself)) return;
+
+    int update_config = 0;
+    for (int i = 0; i < CLUSTER_SLOTS; i++) {
+        /* Skip if: no keys, already has an owner, or we are importing it. */
+        if (!countKeysInSlot(i) ||
+            server.cluster->slots[i] != NULL ||
+            server.cluster->importing_slots_from[i] != NULL)
+        {
+            continue;
+        }
+
+        /* If we are here data and cluster config don't agree, and we have
+         * slot 'i' populated even if we are not importing it, nor anyone else
+         * is assigned to it. Fix this condition by taking ownership. */
+        update_config++;
+        serverLog(LL_NOTICE, "I have keys for unassigned slot %d. "
+                             "Taking responsibility for it.", i);
+        clusterAddSlot(myself, i);
+    }
+    if (update_config) clusterSaveConfigOrDie(1);
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n "clusterClaimUnassignedSlots" -C 5

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1730

🏁 Script executed:

# Get more context around the call site to see which function it's in sed -n '2190,2220p' src/cluster.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1266

🏁 Script executed:

# Find where verifyClusterConfigWithData is called rg -n "verifyClusterConfigWithData" -C 3

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1356

🏁 Script executed:

# Get context around line 7768 to identify the function sed -n '7700,7780p' src/server.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 3748

🏁 Script executed:

# Search for when state is broadcast after config changes in cluster mode rg -n "clusterBeforeSleep|replicateCommandPropagate" -A 2 -B 2 | head -50

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1344

🏁 Script executed:

# Find clusterSaveConfigOrDie implementation rg -n "clusterSaveConfigOrDie" -A 10 | head -40

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 2648

🏁 Script executed:

# Check if clusterSaveConfigOrDie sets any broadcast flags rg -n "todo_before_sleep" -B 2 -A 2 | grep -E "(todo_before_sleep|clusterSaveConfig)" | head -20

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 919

🏁 Script executed:

# Check clusterAddSlot and surrounding broadcast logic rg -n "clusterAddSlot" -B 3 -A 8 | head -60

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 3994

🏁 Script executed:

# Check what happens in the caller after verifyClusterConfigWithData sed -n '7760,7790p' src/server.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1584

🏁 Script executed:

# Check clusterBeforeSleep implementation for todo_before_sleep handling sed -n '4942,5000p' src/cluster_legacy.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 2538

🏁 Script executed:

# Search for what sets CLUSTER_TODO_SAVE_CONFIG or related flags rg -n "CLUSTER_TODO" -B 1 | grep -E "(CLUSTER_TODO_|clusterDoBeforeSleep|clusterSaveConfig)" | head -30

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 2641

🏁 Script executed:

# Check if clusterClaimUnassignedSlots sets any todo flags sed -n '5259,5284p' src/cluster_legacy.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 1346

🏁 Script executed:

# Compare with similar slot claim/add patterns - look at line 2419-2426 and 4268-4279 sed -n '2415,2430p' src/cluster_legacy.c | cat -n

Repository: qodo-benchmark/redis-combined-coderabbit

Length of output: 985

Add missing state and broadcast updates when claiming slots.

The function claims slots and saves config but lacks the state/broadcast updates that similar code performs. Compare with lines 2423–2425 and 4268–4279: after calling clusterAddSlot(), the code invokes clusterDoBeforeSleep() with CLUSTER_TODO_UPDATE_STATE and/or CLUSTER_TODO_BROADCAST_PONG flags. This ensures the cluster state is updated and peers are notified of the claim. Without these, peers remain unaware of the new slot assignments until the next periodic update.

🤖 Prompt for AI Agents

In `@src/cluster_legacy.c` around lines 5259 - 5284, The clusterClaimUnassignedSlots function currently calls clusterAddSlot(myself, i) and clusterSaveConfigOrDie but does not request the usual state/broadcast updates; modify it so that after taking a slot you mark the need to update state and notify peers by invoking clusterDoBeforeSleep() with CLUSTER_TODO_UPDATE_STATE and CLUSTER_TODO_BROADCAST_PONG (or accumulate those flags into update_config and call clusterDoBeforeSleep(update_flags) after the loop), ensuring clusterAddSlot, clusterDoBeforeSleep, CLUSTER_TODO_UPDATE_STATE, and CLUSTER_TODO_BROADCAST_PONG are used exactly as in the analogous code paths so peers are informed of the new slot ownership before saving config.

coderabbitai · 2026-01-29T09:49:05Z

+void clusterCommonInit(void) {
+    server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));
+    resetClusterStats();
+    asmInit();
+}


⚠️ Potential issue | 🔴 Critical

Use zcalloc instead of deprecated malloc — this is causing the build failure.

The pipeline fails with -Werror=deprecated-declarations because malloc is deprecated in this codebase. Redis uses zmalloc/zcalloc for memory allocation. Using zcalloc also zero-initializes the memory, which is appropriate before resetClusterStats().

🔧 Proposed fix

void clusterCommonInit(void) { - server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); + server.cluster_slot_stats = zcalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); resetClusterStats(); asmInit(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void clusterCommonInit(void) {

server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));

resetClusterStats();

asmInit();

}

void clusterCommonInit(void) {

server.cluster_slot_stats = zcalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));

resetClusterStats();

asmInit();

}

🧰 Tools

🪛 GitHub Actions: Codecov

[error] 2138-2138: malloc is deprecated. Function call triggers -Werror=deprecated-declarations; build failure during lcov step due to deprecated malloc usage in clusterCommonInit.

🪛 GitHub Actions: External Server Tests

[error] 2138-2138: ClusterCommonInit: malloc is deprecated (-Werror=deprecated-declarations). Use calloc or ensure proper usage to avoid deprecation warning treated as an error.

🤖 Prompt for AI Agents

In `@src/cluster.c` around lines 2137 - 2141, Replace the deprecated malloc call in clusterCommonInit by allocating zeroed memory with zcalloc: allocate server.cluster_slot_stats using zcalloc(CLUSTER_SLOTS, sizeof(clusterSlotStat)) instead of malloc(...), keeping the subsequent resetClusterStats() and asmInit() calls unchanged; update any necessary includes/headers if zcalloc isn’t already available in this compilation unit.

coderabbitai · 2026-01-29T09:49:06Z

+
+/* This function is called after the node startup in order to verify that data
+ * loaded from disk is in agreement with the cluster configuration:
+ *
+ * 1) If we find keys about hash slots we have no responsibility for, the
+ *    following happens:
+ *    A) If no other node is in charge according to the current cluster
+ *       configuration, we add these slots to our node.
+ *    B) If according to our config other nodes are already in charge for
+ *       this slots, we set the slots as IMPORTING from our point of view
+ *       in order to justify we have those slots, and in order to make
+ *       redis-cli aware of the issue, so that it can try to fix it.
+ * 2) If we find data in a DB different than DB0 we return C_ERR to
+ *    signal the caller it should quit the server with an error message
+ *    or take other actions.
+ *
+ * The function always returns C_OK even if it will try to correct
+ * the error described in "1". However if data is found in DB different
+ * from DB0, C_ERR is returned.
+ *
+ * The function also uses the logging facility in order to warn the user
+ * about desynchronizations between the data we have in memory and the
+ * cluster configuration. */
+int verifyClusterConfigWithData(void) {
+    /* Return ASAP if a module disabled cluster redirections. In that case
+     * every master can store keys about every possible hash slot. */
+    if (server.cluster_module_flags & CLUSTER_MODULE_FLAG_NO_REDIRECTION)
+        return C_OK;
+
+    /* If this node is a slave, don't perform the check at all as we
+     * completely depend on the replication stream. */
+    if (clusterNodeIsSlave(getMyClusterNode())) return C_OK;
+
+    /* Make sure we only have keys in DB0. */
+    for (int i = 1; i < server.dbnum; i++) {
+        if (kvstoreSize(server.db[i].keys)) return C_ERR;
+    }
+
+    /* Delete keys in unowned slots */
+    clusterDeleteKeysInUnownedSlots();
+    /* Take over slots that we have keys for, but are assigned to no one. */
+    clusterClaimUnassignedSlots();
+    return C_OK;
+}


⚠️ Potential issue | 🟡 Minor

Function comment at lines 2179-2182 is inconsistent with implementation.

The comment states that if other nodes are in charge of slots, "we set the slots as IMPORTING from our point of view." However, the implementation at line 2210 calls clusterDeleteKeysInUnownedSlots(), which deletes keys in those slots rather than setting IMPORTING state.

The comment in clusterDeleteKeysInUnownedSlots() explains this is safe with atomic slot migration, but the parent function's documentation should be updated to reflect this behavior change.

📝 Suggested comment update

/* This function is called after the node startup in order to verify that data * loaded from disk is in agreement with the cluster configuration: * * 1) If we find keys about hash slots we have no responsibility for, the * following happens: * A) If no other node is in charge according to the current cluster * configuration, we add these slots to our node. - * B) If according to our config other nodes are already in charge for - * this slots, we set the slots as IMPORTING from our point of view - * in order to justify we have those slots, and in order to make - * redis-cli aware of the issue, so that it can try to fix it. + * B) If according to our config other nodes are already in charge for + * these slots, we delete the keys in those slots. With atomic slot + * migration, it is safe to drop keys from slots that are not owned. * 2) If we find data in a DB different than DB0 we return C_ERR to * signal the caller it should quit the server with an error message * or take other actions.

🤖 Prompt for AI Agents

In `@src/cluster.c` around lines 2171 - 2214, The function comment for verifyClusterConfigWithData is inaccurate about handling keys for slots owned by other nodes: instead of marking such slots IMPORTING, the implementation calls clusterDeleteKeysInUnownedSlots() which removes those keys (safe with atomic slot migration), then calls clusterClaimUnassignedSlots() to take truly unassigned slots; update the comment to reflect that behavior change by describing deletion via clusterDeleteKeysInUnownedSlots() and claiming via clusterClaimUnassignedSlots() rather than setting IMPORTING.

tezc and others added 3 commits January 25, 2026 12:12

Refactor some of ASM and slot-stats functions

074e913

better diff

56470f2

update pr

865827b

coderabbitai Bot reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor some of ASM and slot-stats functions#26

Refactor some of ASM and slot-stats functions#26
tomerqodo wants to merge 3 commits intocoderabbit_full_base_refactor_some_of_asm_and_slot-stats_functions_pr4from
coderabbit_full_head_refactor_some_of_asm_and_slot-stats_functions_pr4

tomerqodo commented Jan 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 29, 2026

Uh oh!

coderabbitai Bot Jan 29, 2026

Uh oh!

coderabbitai Bot Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomerqodo commented Jan 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomerqodo commented Jan 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 29, 2026 •

edited

Loading