Refactor some of ASM and slot-stats functions#4
Conversation
📝 WalkthroughWalkthroughThe changes refactor cluster initialization and command handling by introducing new cluster management functions (clusterCommonInit, clusterDeleteKeysInUnownedSlots, verifyClusterConfigWithData, clusterClaimUnassignedSlots), expanding subcommand routing for migration/syncslots, and reorganizing ASM lifecycle hook integration into server startup, cron, and sleep mechanisms. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cluster.c`:
- Around line 2146-2213: clusterDeleteKeysInUnownedSlots currently deletes keys
for unassigned slots before clusterClaimUnassignedSlots runs, causing data loss;
change clusterDeleteKeysInUnownedSlots so it skips slots that are unassigned
(i.e., where getNodeBySlot(i) returns NULL) and only deletes keys for slots
assigned to another node (or still importing), then update the comment in
verifyClusterConfigWithData to state that unassigned slots are preserved and
will be claimed by clusterClaimUnassignedSlots; reference symbols to change:
clusterDeleteKeysInUnownedSlots, getNodeBySlot, clusterDelKeysInSlot,
clusterClaimUnassignedSlots, and verifyClusterConfigWithData.
🧹 Nitpick comments (1)
src/cluster_legacy.c (1)
5259-5284: Trigger cluster state update/broadcast after slot claim.The slot ownership changes are persisted but not immediately propagated; peers and local state can remain stale until cron. Consider scheduling an update/broadcast when
update_configis non-zero.♻️ Suggested change
- if (update_config) clusterSaveConfigOrDie(1); + if (update_config) { + clusterSaveConfigOrDie(1); + clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE| + CLUSTER_TODO_BROADCAST_PONG); + }
| void clusterDeleteKeysInUnownedSlots(void) { | ||
| if (clusterNodeIsSlave(getMyClusterNode())) return; | ||
|
|
||
| /* Check that all the slots we have keys for are assigned to us. Otherwise, | ||
| * delete the keys. */ | ||
| for (int i = 0; i < CLUSTER_SLOTS; i++) { | ||
| /* Skip if: no keys in the slot, it's our slot, or we are importing it. */ | ||
| if (!countKeysInSlot(i) || | ||
| clusterIsMySlot(i) || | ||
| getImportingSlotSource(i)) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| serverLog(LL_NOTICE, "I have keys for slot %d, but the slot is " | ||
| "assigned to another node. " | ||
| "Deleting keys in the slot.", i); | ||
| /* With atomic slot migration, it is safe to drop keys from slots | ||
| * that are not owned. This will not result in data loss under the | ||
| * legacy slot migration approach either, since the importing state | ||
| * has already been persisted in node.conf. */ | ||
| clusterDelKeysInSlot(i, 0); | ||
| } | ||
| } | ||
|
|
||
|
|
||
| /* This function is called after the node startup in order to verify that data | ||
| * loaded from disk is in agreement with the cluster configuration: | ||
| * | ||
| * 1) If we find keys about hash slots we have no responsibility for, the | ||
| * following happens: | ||
| * A) If no other node is in charge according to the current cluster | ||
| * configuration, we add these slots to our node. | ||
| * B) If according to our config other nodes are already in charge for | ||
| * this slots, we set the slots as IMPORTING from our point of view | ||
| * in order to justify we have those slots, and in order to make | ||
| * redis-cli aware of the issue, so that it can try to fix it. | ||
| * 2) If we find data in a DB different than DB0 we return C_ERR to | ||
| * signal the caller it should quit the server with an error message | ||
| * or take other actions. | ||
| * | ||
| * The function always returns C_OK even if it will try to correct | ||
| * the error described in "1". However if data is found in DB different | ||
| * from DB0, C_ERR is returned. | ||
| * | ||
| * The function also uses the logging facility in order to warn the user | ||
| * about desynchronizations between the data we have in memory and the | ||
| * cluster configuration. */ | ||
| int verifyClusterConfigWithData(void) { | ||
| /* Return ASAP if a module disabled cluster redirections. In that case | ||
| * every master can store keys about every possible hash slot. */ | ||
| if (server.cluster_module_flags & CLUSTER_MODULE_FLAG_NO_REDIRECTION) | ||
| return C_OK; | ||
|
|
||
| /* If this node is a slave, don't perform the check at all as we | ||
| * completely depend on the replication stream. */ | ||
| if (clusterNodeIsSlave(getMyClusterNode())) return C_OK; | ||
|
|
||
| /* Make sure we only have keys in DB0. */ | ||
| for (int i = 1; i < server.dbnum; i++) { | ||
| if (kvstoreSize(server.db[i].keys)) return C_ERR; | ||
| } | ||
|
|
||
| /* Delete keys in unowned slots */ | ||
| clusterDeleteKeysInUnownedSlots(); | ||
| /* Take over slots that we have keys for, but are assigned to no one. */ | ||
| clusterClaimUnassignedSlots(); | ||
| return C_OK; |
There was a problem hiding this comment.
Critical: unassigned-slot data can be deleted before it’s claimed.
clusterDeleteKeysInUnownedSlots() deletes keys for any slot that is not “mine” and not importing. For unassigned slots, getNodeBySlot(i) is NULL, so keys are deleted before clusterClaimUnassignedSlots() runs, causing data loss and contradicting the function’s docstring in verifyClusterConfigWithData. This is a regression in the recovery path.
Fix: skip unassigned slots (or claim before deleting) and update the comment to match behavior.
🐛 Proposed fix (skip unassigned slots + update doc)
-/* This function is called after the node startup in order to verify that data
+/* This function is called after the node startup in order to verify that data
* loaded from disk is in agreement with the cluster configuration:
*
* 1) If we find keys about hash slots we have no responsibility for, the
* following happens:
* A) If no other node is in charge according to the current cluster
* configuration, we add these slots to our node.
- * B) If according to our config other nodes are already in charge for
- * this slots, we set the slots as IMPORTING from our point of view
- * in order to justify we have those slots, and in order to make
- * redis-cli aware of the issue, so that it can try to fix it.
+ * B) If according to our config other nodes are already in charge for
+ * these slots, we drop the local keys for those slots.
* 2) If we find data in a DB different than DB0 we return C_ERR to
* signal the caller it should quit the server with an error message
* or take other actions.
@@
void clusterDeleteKeysInUnownedSlots(void) {
- if (clusterNodeIsSlave(getMyClusterNode())) return;
+ clusterNode *myself = getMyClusterNode();
+ if (clusterNodeIsSlave(myself)) return;
/* Check that all the slots we have keys for are assigned to us. Otherwise,
* delete the keys. */
for (int i = 0; i < CLUSTER_SLOTS; i++) {
+ clusterNode *owner = getNodeBySlot(i);
/* Skip if: no keys in the slot, it's our slot, or we are importing it. */
if (!countKeysInSlot(i) ||
- clusterIsMySlot(i) ||
+ owner == NULL || owner == myself ||
getImportingSlotSource(i))
{
continue;
}🤖 Prompt for AI Agents
In `@src/cluster.c` around lines 2146 - 2213, clusterDeleteKeysInUnownedSlots
currently deletes keys for unassigned slots before clusterClaimUnassignedSlots
runs, causing data loss; change clusterDeleteKeysInUnownedSlots so it skips
slots that are unassigned (i.e., where getNodeBySlot(i) returns NULL) and only
deletes keys for slots assigned to another node (or still importing), then
update the comment in verifyClusterConfigWithData to state that unassigned slots
are preserved and will be claimed by clusterClaimUnassignedSlots; reference
symbols to change: clusterDeleteKeysInUnownedSlots, getNodeBySlot,
clusterDelKeysInSlot, clusterClaimUnassignedSlots, and
verifyClusterConfigWithData.
Benchmark PR from agentic-review-benchmarks#4
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor