Skip to content

Refactor some of ASM and slot-stats functions#4

Open
tomerqodo wants to merge 3 commits intocoderabbit_full_base_refactor_some_of_asm_and_slot-stats_functions_pr4from
coderabbit_full_head_refactor_some_of_asm_and_slot-stats_functions_pr4
Open

Refactor some of ASM and slot-stats functions#4
tomerqodo wants to merge 3 commits intocoderabbit_full_base_refactor_some_of_asm_and_slot-stats_functions_pr4from
coderabbit_full_head_refactor_some_of_asm_and_slot-stats_functions_pr4

Conversation

@tomerqodo
Copy link
Copy Markdown

@tomerqodo tomerqodo commented Jan 30, 2026

Benchmark PR from agentic-review-benchmarks#4

Summary by CodeRabbit

Release Notes

  • New Features

    • Added cluster commands for "migration" and "syncslots" operations
    • Automatic claiming of unassigned slots during cluster initialization
    • Enhanced cluster slot management and reconciliation
  • Bug Fixes

    • Cluster now validates configuration against actual data at startup
    • Automatic cleanup of keys in slots not owned by the node
  • Refactor

    • Reorganized cluster initialization and maintenance routines for improved modularity

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

The changes refactor cluster initialization and command handling by introducing new cluster management functions (clusterCommonInit, clusterDeleteKeysInUnownedSlots, verifyClusterConfigWithData, clusterClaimUnassignedSlots), expanding subcommand routing for migration/syncslots, and reorganizing ASM lifecycle hook integration into server startup, cron, and sleep mechanisms.

Changes

Cohort / File(s) Summary
Cluster initialization and validation
src/cluster.c, src/cluster.h
Introduces clusterCommonInit() to allocate and initialize cluster slot stats and ASM support. Adds verifyClusterConfigWithData() to validate in-memory data against cluster configuration, including DB0-only enforcement and cleanup of keys in unowned slots. Adds clusterDeleteKeysInUnownedSlots() to delete keys in slots not owned by the current node.
Cluster slot ownership
src/cluster_legacy.c
Adds clusterClaimUnassignedSlots() to scan and assign unowned slots containing keys to the current node. Removes legacy verification logic and reorganizes ASM integration points from legacy cron/sleep paths.
Server lifecycle integration
src/server.c
Refactors cluster initialization to call clusterCommonInit() before clusterInit() instead of direct zmalloc. Integrates ASM cron/sleep hooks (asmCron, asmBeforeSleep) alongside existing cluster lifecycle calls in serverCron and beforeSleep.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along, dear slots so bright,
New claims and cleanups set things right,
ASM hooks dance in cron's sweet flow,
Cluster config's verified, don't you know!
Init refactored, tidy and lean,
The finest cluster dance you've seen!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Refactor some of ASM and slot-stats functions' directly and accurately summarizes the main changes: moving ASM initialization and slot-stats setup from cluster_legacy.c to a new clusterCommonInit() function, and reorganizing these initialization routines across server startup code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch coderabbit_full_head_refactor_some_of_asm_and_slot-stats_functions_pr4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@OmriGM
Copy link
Copy Markdown

OmriGM commented Feb 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@OmriGM
Copy link
Copy Markdown

OmriGM commented Feb 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/cluster.c`:
- Around line 2146-2213: clusterDeleteKeysInUnownedSlots currently deletes keys
for unassigned slots before clusterClaimUnassignedSlots runs, causing data loss;
change clusterDeleteKeysInUnownedSlots so it skips slots that are unassigned
(i.e., where getNodeBySlot(i) returns NULL) and only deletes keys for slots
assigned to another node (or still importing), then update the comment in
verifyClusterConfigWithData to state that unassigned slots are preserved and
will be claimed by clusterClaimUnassignedSlots; reference symbols to change:
clusterDeleteKeysInUnownedSlots, getNodeBySlot, clusterDelKeysInSlot,
clusterClaimUnassignedSlots, and verifyClusterConfigWithData.
🧹 Nitpick comments (1)
src/cluster_legacy.c (1)

5259-5284: Trigger cluster state update/broadcast after slot claim.

The slot ownership changes are persisted but not immediately propagated; peers and local state can remain stale until cron. Consider scheduling an update/broadcast when update_config is non-zero.

♻️ Suggested change
-    if (update_config) clusterSaveConfigOrDie(1);
+    if (update_config) {
+        clusterSaveConfigOrDie(1);
+        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|
+                             CLUSTER_TODO_BROADCAST_PONG);
+    }

Comment thread src/cluster.c
Comment on lines +2146 to +2213
void clusterDeleteKeysInUnownedSlots(void) {
if (clusterNodeIsSlave(getMyClusterNode())) return;

/* Check that all the slots we have keys for are assigned to us. Otherwise,
* delete the keys. */
for (int i = 0; i < CLUSTER_SLOTS; i++) {
/* Skip if: no keys in the slot, it's our slot, or we are importing it. */
if (!countKeysInSlot(i) ||
clusterIsMySlot(i) ||
getImportingSlotSource(i))
{
continue;
}

serverLog(LL_NOTICE, "I have keys for slot %d, but the slot is "
"assigned to another node. "
"Deleting keys in the slot.", i);
/* With atomic slot migration, it is safe to drop keys from slots
* that are not owned. This will not result in data loss under the
* legacy slot migration approach either, since the importing state
* has already been persisted in node.conf. */
clusterDelKeysInSlot(i, 0);
}
}


/* This function is called after the node startup in order to verify that data
* loaded from disk is in agreement with the cluster configuration:
*
* 1) If we find keys about hash slots we have no responsibility for, the
* following happens:
* A) If no other node is in charge according to the current cluster
* configuration, we add these slots to our node.
* B) If according to our config other nodes are already in charge for
* this slots, we set the slots as IMPORTING from our point of view
* in order to justify we have those slots, and in order to make
* redis-cli aware of the issue, so that it can try to fix it.
* 2) If we find data in a DB different than DB0 we return C_ERR to
* signal the caller it should quit the server with an error message
* or take other actions.
*
* The function always returns C_OK even if it will try to correct
* the error described in "1". However if data is found in DB different
* from DB0, C_ERR is returned.
*
* The function also uses the logging facility in order to warn the user
* about desynchronizations between the data we have in memory and the
* cluster configuration. */
int verifyClusterConfigWithData(void) {
/* Return ASAP if a module disabled cluster redirections. In that case
* every master can store keys about every possible hash slot. */
if (server.cluster_module_flags & CLUSTER_MODULE_FLAG_NO_REDIRECTION)
return C_OK;

/* If this node is a slave, don't perform the check at all as we
* completely depend on the replication stream. */
if (clusterNodeIsSlave(getMyClusterNode())) return C_OK;

/* Make sure we only have keys in DB0. */
for (int i = 1; i < server.dbnum; i++) {
if (kvstoreSize(server.db[i].keys)) return C_ERR;
}

/* Delete keys in unowned slots */
clusterDeleteKeysInUnownedSlots();
/* Take over slots that we have keys for, but are assigned to no one. */
clusterClaimUnassignedSlots();
return C_OK;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: unassigned-slot data can be deleted before it’s claimed.
clusterDeleteKeysInUnownedSlots() deletes keys for any slot that is not “mine” and not importing. For unassigned slots, getNodeBySlot(i) is NULL, so keys are deleted before clusterClaimUnassignedSlots() runs, causing data loss and contradicting the function’s docstring in verifyClusterConfigWithData. This is a regression in the recovery path.

Fix: skip unassigned slots (or claim before deleting) and update the comment to match behavior.

🐛 Proposed fix (skip unassigned slots + update doc)
-/* This function is called after the node startup in order to verify that data
+/* This function is called after the node startup in order to verify that data
  * loaded from disk is in agreement with the cluster configuration:
  *
  * 1) If we find keys about hash slots we have no responsibility for, the
  *    following happens:
  *    A) If no other node is in charge according to the current cluster
  *       configuration, we add these slots to our node.
- *    B) If according to our config other nodes are already in charge for
- *       this slots, we set the slots as IMPORTING from our point of view
- *       in order to justify we have those slots, and in order to make
- *       redis-cli aware of the issue, so that it can try to fix it.
+ *    B) If according to our config other nodes are already in charge for
+ *       these slots, we drop the local keys for those slots.
  * 2) If we find data in a DB different than DB0 we return C_ERR to
  *    signal the caller it should quit the server with an error message
  *    or take other actions.
@@
 void clusterDeleteKeysInUnownedSlots(void) {
-    if (clusterNodeIsSlave(getMyClusterNode())) return;
+    clusterNode *myself = getMyClusterNode();
+    if (clusterNodeIsSlave(myself)) return;
 
     /* Check that all the slots we have keys for are assigned to us. Otherwise,
      * delete the keys. */
     for (int i = 0; i < CLUSTER_SLOTS; i++) {
+        clusterNode *owner = getNodeBySlot(i);
         /* Skip if: no keys in the slot, it's our slot, or we are importing it. */
         if (!countKeysInSlot(i) ||
-            clusterIsMySlot(i) ||
+            owner == NULL || owner == myself ||
             getImportingSlotSource(i))
         {
             continue;
         }
🤖 Prompt for AI Agents
In `@src/cluster.c` around lines 2146 - 2213, clusterDeleteKeysInUnownedSlots
currently deletes keys for unassigned slots before clusterClaimUnassignedSlots
runs, causing data loss; change clusterDeleteKeysInUnownedSlots so it skips
slots that are unassigned (i.e., where getNodeBySlot(i) returns NULL) and only
deletes keys for slots assigned to another node (or still importing), then
update the comment in verifyClusterConfigWithData to state that unassigned slots
are preserved and will be claimed by clusterClaimUnassignedSlots; reference
symbols to change: clusterDeleteKeysInUnownedSlots, getNodeBySlot,
clusterDelKeysInSlot, clusterClaimUnassignedSlots, and
verifyClusterConfigWithData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants