Redis 8.2.1#6
Conversation
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276) This bug was introduced by redis#14130 found by @oranagra ### Summary Because `s->cgroup_ref` is created at runtime the first time a consumer group is linked with a message, but it is not released when all references are removed. However, after `debug reload` or restart, if the PEL is empty (meaning no consumer group is referencing any message), `s->cgroup_ref` will not be recreated. As a result, when executing XADD or XTRIM with `ACKED` option and checking whether a message that is being read but has not been ACKed can be deleted, the cgroup_ref being NULL will cause a crash. ### Code Path ``` xaddCommand -> streamTrim -> streamEntryIsReferenced ``` ### Solution Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
📝 WalkthroughWalkthroughVersion 8.2.1 patch release that addresses a defragmentation bug during replication, adds a safety check in stream module operations, includes corresponding test coverage for both fixes, updates version identifiers, and documents changes in release notes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
1 similar comment
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/replication.c`:
- Around line 1952-1962: The saved original defrag state orig_active_defrag is
not used and server.active_defrag_enabled is unconditionally set to 1 after
emptyData; change the restore to reassign the saved value
(server.active_defrag_enabled = orig_active_defrag) so the prior state is
preserved after the call to emptyData (location: the block around emptyData and
replicationEmptyDbCallback), ensuring orig_active_defrag is used to restore the
original setting.
In `@tests/unit/memefficiency.tcl`:
- Around line 70-76: The condition in proc discard_replies_every is inverted
causing reads to occur on all non-multiples and potentially block; change the
modulo check to trigger discarding only when count % frequency == 0 (i.e.,
replace "if {$count % $frequency != 0}" with "if {$count % $frequency == 0}") so
the for-loop that calls "$rd read" runs only on the intended periodic iterations
and avoids exhausting replies/blocking.
🧹 Nitpick comments (1)
src/t_stream.c (1)
2708-2708: Safety check prevents NULL pointer dereference, but the return value warrants discussion.The NULL check correctly prevents a crash when
cgroups_refis NULL (e.g., after RDB reload with empty PEL). However, returning1(referenced) whencgroups_refis NULL may be overly conservative.If
cgroups_refis NULL, it means no entries exist in any consumer group's PEL. Combined with the prior check that the entry ID is not less thanmin_cgroup_last_id, this suggests the entry is fully acknowledged and should return0.That said, returning
1is the safer choice because:
- It prevents accidental deletion of potentially valid entries
- After reload, the state might not be fully reconstructed yet
If the intent is to be conservative after reload scenarios, this is acceptable. If precision is needed, consider tracking whether
cgroups_refis NULL due to "truly empty" vs "not yet reconstructed".💡 Alternative: Return 0 if truly no PEL entries exist
If you want to be more precise rather than conservative:
/* Check if the message is in any consumer group's PEL */ - if (!s->cgroups_ref) return 1; + if (!s->cgroups_ref) return 0; /* No PEL entries means not referenced */However, the current conservative approach (
return 1) is safer and aligns with the crash-prevention intent.
|
|
||
| /* Temporarily disable active defragmentation during database flush. | ||
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | ||
| * which could modify the database while it's being emptied. */ | ||
| int orig_active_defrag = server.active_defrag_enabled; | ||
| server.active_defrag_enabled = 0; | ||
|
|
||
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | ||
|
|
||
| /* Restore the original active defragmentation setting. */ | ||
| server.active_defrag_enabled = 1; |
There was a problem hiding this comment.
Restore active defrag to its prior state (currently always enables).
You store orig_active_defrag but restore to 1, so a replica that had defrag disabled ends up with it enabled after the flush. This also leaves orig_active_defrag unused. Please restore the saved value.
🛠️ Proposed fix
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;
emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
/* Restore the original active defragmentation setting. */
- server.active_defrag_enabled = 1;
+ server.active_defrag_enabled = orig_active_defrag;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Temporarily disable active defragmentation during database flush. | |
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | |
| * which could modify the database while it's being emptied. */ | |
| int orig_active_defrag = server.active_defrag_enabled; | |
| server.active_defrag_enabled = 0; | |
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | |
| /* Restore the original active defragmentation setting. */ | |
| server.active_defrag_enabled = 1; | |
| /* Temporarily disable active defragmentation during database flush. | |
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | |
| * which could modify the database while it's being emptied. */ | |
| int orig_active_defrag = server.active_defrag_enabled; | |
| server.active_defrag_enabled = 0; | |
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | |
| /* Restore the original active defragmentation setting. */ | |
| server.active_defrag_enabled = orig_active_defrag; |
🤖 Prompt for AI Agents
In `@src/replication.c` around lines 1952 - 1962, The saved original defrag state
orig_active_defrag is not used and server.active_defrag_enabled is
unconditionally set to 1 after emptyData; change the restore to reassign the
saved value (server.active_defrag_enabled = orig_active_defrag) so the prior
state is preserved after the call to emptyData (location: the block around
emptyData and replicationEmptyDbCallback), ensuring orig_active_defrag is used
to restore the original setting.
| proc discard_replies_every {rd count frequency discard_num} { | ||
| if {$count % $frequency != 0} { | ||
| for {set k 0} {$k < $discard_num} {incr k} { | ||
| $rd read ; # Discard replies | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
discard_replies_every condition is inverted and will block on reads.
With != 0, you discard on every non‑multiple, quickly exhausting replies and blocking on $rd read. The intent is to discard only every frequency iterations.
🛠️ Proposed fix
proc discard_replies_every {rd count frequency discard_num} {
- if {$count % $frequency != 0} {
+ if {$count % $frequency == 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| proc discard_replies_every {rd count frequency discard_num} { | |
| if {$count % $frequency != 0} { | |
| for {set k 0} {$k < $discard_num} {incr k} { | |
| $rd read ; # Discard replies | |
| } | |
| } | |
| } | |
| proc discard_replies_every {rd count frequency discard_num} { | |
| if {$count % $frequency == 0} { | |
| for {set k 0} {$k < $discard_num} {incr k} { | |
| $rd read ; # Discard replies | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@tests/unit/memefficiency.tcl` around lines 70 - 76, The condition in proc
discard_replies_every is inverted causing reads to occur on all non-multiples
and potentially block; change the modulo check to trigger discarding only when
count % frequency == 0 (i.e., replace "if {$count % $frequency != 0}" with "if
{$count % $frequency == 0}") so the for-loop that calls "$rd read" runs only on
the intended periodic iterations and avoids exhausting replies/blocking.
Benchmark PR from agentic-review-benchmarks#6
Summary by CodeRabbit
Release Notes for v8.2.1
Bug Fixes
Performance