Skip to content

Redis 8.2.1#6

Open
tomerqodo wants to merge 4 commits intocoderabbit_full_base_redis_821_pr6from
coderabbit_full_head_redis_821_pr6
Open

Redis 8.2.1#6
tomerqodo wants to merge 4 commits intocoderabbit_full_base_redis_821_pr6from
coderabbit_full_head_redis_821_pr6

Conversation

@tomerqodo
Copy link
Copy Markdown

@tomerqodo tomerqodo commented Jan 30, 2026

Benchmark PR from agentic-review-benchmarks#6

Summary by CodeRabbit

Release Notes for v8.2.1

  • Bug Fixes

    • Fixed defragmentation behavior during replication database flush operations
    • Resolved crash issue in stream operations under specific conditions
    • Additional stability improvements
  • Performance

    • Optimized resource utilization and memory efficiency

sundb and others added 4 commits January 25, 2026 12:12
…s#14274)

Fix redis#14267
This bug was introduced by redis#13495

### Summary

When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.

Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```

### Solution

This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276)

This bug was introduced by redis#14130
found by @oranagra

### Summary

Because `s->cgroup_ref` is created at runtime the first time a consumer
group is linked with a message, but it is not released when all
references are removed.

However, after `debug reload` or restart, if the PEL is empty (meaning
no consumer group is referencing any message), `s->cgroup_ref` will not
be recreated.

As a result, when executing XADD or XTRIM with `ACKED` option and
checking whether a message that is being read but has not been ACKed can
be deleted, the cgroup_ref being NULL will cause a crash.

### Code Path
```
xaddCommand -> streamTrim -> streamEntryIsReferenced
```

### Solution

Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Version 8.2.1 patch release that addresses a defragmentation bug during replication, adds a safety check in stream module operations, includes corresponding test coverage for both fixes, updates version identifiers, and documents changes in release notes.

Changes

Cohort / File(s) Summary
Release & Versioning
00-RELEASENOTES, src/version.h
Adds 8.2.1 release notes documenting four bug fixes and two performance improvements. Bumps version macros from 8.2.0 to 8.2.1 with updated version number constants.
Replication Fix
src/replication.c
Disables and restores Redis defragmentation feature around the replication empty-database flush operation to prevent defrag interference during critical replication operations.
Stream Module Safety
src/t_stream.c
Adds NULL pointer check for cgroups_ref in streamEntryIsReferenced() to safely handle cases where consumer group reference metadata is absent.
Bug Fix Tests
tests/unit/memefficiency.tcl
Introduces discard_replies_every() helper procedure to reduce code duplication. Adds new test verifying defragmentation doesn't trigger during replication database flush (issue #14267).
Stream Module Tests
tests/unit/type/stream.tcl
Adds regression test for XADD with ACKED option following DEBUG RELOAD to prevent crashes related to stream consumer group state restoration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Version hops to 8.2.1,
Defrag stays calm when replication runs,
Streams check their refs, no crashes now,
Tests verify the fixes—oh how!
A patch release, bug-free and sound!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Redis 8.2.1' is overly generic and does not clearly describe the main changes in the pull request, which include bug fixes for defrag handling, NULL pointer crashes, and test coverage. Consider a more descriptive title that captures the primary changes, such as 'Fix active defrag during replication and stream crashes' or 'Redis 8.2.1: Bug fixes for defrag and stream handling'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch coderabbit_full_head_redis_821_pr6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

1 similar comment
@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@tomerqodo
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@OmriGM
Copy link
Copy Markdown

OmriGM commented Feb 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@OmriGM
Copy link
Copy Markdown

OmriGM commented Feb 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/replication.c`:
- Around line 1952-1962: The saved original defrag state orig_active_defrag is
not used and server.active_defrag_enabled is unconditionally set to 1 after
emptyData; change the restore to reassign the saved value
(server.active_defrag_enabled = orig_active_defrag) so the prior state is
preserved after the call to emptyData (location: the block around emptyData and
replicationEmptyDbCallback), ensuring orig_active_defrag is used to restore the
original setting.

In `@tests/unit/memefficiency.tcl`:
- Around line 70-76: The condition in proc discard_replies_every is inverted
causing reads to occur on all non-multiples and potentially block; change the
modulo check to trigger discarding only when count % frequency == 0 (i.e.,
replace "if {$count % $frequency != 0}" with "if {$count % $frequency == 0}") so
the for-loop that calls "$rd read" runs only on the intended periodic iterations
and avoids exhausting replies/blocking.
🧹 Nitpick comments (1)
src/t_stream.c (1)

2708-2708: Safety check prevents NULL pointer dereference, but the return value warrants discussion.

The NULL check correctly prevents a crash when cgroups_ref is NULL (e.g., after RDB reload with empty PEL). However, returning 1 (referenced) when cgroups_ref is NULL may be overly conservative.

If cgroups_ref is NULL, it means no entries exist in any consumer group's PEL. Combined with the prior check that the entry ID is not less than min_cgroup_last_id, this suggests the entry is fully acknowledged and should return 0.

That said, returning 1 is the safer choice because:

  1. It prevents accidental deletion of potentially valid entries
  2. After reload, the state might not be fully reconstructed yet

If the intent is to be conservative after reload scenarios, this is acceptable. If precision is needed, consider tracking whether cgroups_ref is NULL due to "truly empty" vs "not yet reconstructed".

💡 Alternative: Return 0 if truly no PEL entries exist

If you want to be more precise rather than conservative:

     /* Check if the message is in any consumer group's PEL */
-    if (!s->cgroups_ref) return 1;
+    if (!s->cgroups_ref) return 0;  /* No PEL entries means not referenced */

However, the current conservative approach (return 1) is safer and aligns with the crash-prevention intent.

Comment thread src/replication.c
Comment on lines +1952 to +1962

/* Temporarily disable active defragmentation during database flush.
* This prevents defrag from being triggered in replicationEmptyDbCallback()
* which could modify the database while it's being emptied. */
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;

emptyData(-1, empty_db_flags, replicationEmptyDbCallback);

/* Restore the original active defragmentation setting. */
server.active_defrag_enabled = 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore active defrag to its prior state (currently always enables).

You store orig_active_defrag but restore to 1, so a replica that had defrag disabled ends up with it enabled after the flush. This also leaves orig_active_defrag unused. Please restore the saved value.

🛠️ Proposed fix
     int orig_active_defrag = server.active_defrag_enabled;
     server.active_defrag_enabled = 0;

     emptyData(-1, empty_db_flags, replicationEmptyDbCallback);

     /* Restore the original active defragmentation setting. */
-    server.active_defrag_enabled = 1;
+    server.active_defrag_enabled = orig_active_defrag;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/* Temporarily disable active defragmentation during database flush.
* This prevents defrag from being triggered in replicationEmptyDbCallback()
* which could modify the database while it's being emptied. */
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;
emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
/* Restore the original active defragmentation setting. */
server.active_defrag_enabled = 1;
/* Temporarily disable active defragmentation during database flush.
* This prevents defrag from being triggered in replicationEmptyDbCallback()
* which could modify the database while it's being emptied. */
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;
emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
/* Restore the original active defragmentation setting. */
server.active_defrag_enabled = orig_active_defrag;
🤖 Prompt for AI Agents
In `@src/replication.c` around lines 1952 - 1962, The saved original defrag state
orig_active_defrag is not used and server.active_defrag_enabled is
unconditionally set to 1 after emptyData; change the restore to reassign the
saved value (server.active_defrag_enabled = orig_active_defrag) so the prior
state is preserved after the call to emptyData (location: the block around
emptyData and replicationEmptyDbCallback), ensuring orig_active_defrag is used
to restore the original setting.

Comment on lines +70 to +76
proc discard_replies_every {rd count frequency discard_num} {
if {$count % $frequency != 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

discard_replies_every condition is inverted and will block on reads.

With != 0, you discard on every non‑multiple, quickly exhausting replies and blocking on $rd read. The intent is to discard only every frequency iterations.

🛠️ Proposed fix
 proc discard_replies_every {rd count frequency discard_num} {
-    if {$count % $frequency != 0} {
+    if {$count % $frequency == 0} {
         for {set k 0} {$k < $discard_num} {incr k} {
             $rd read ; # Discard replies
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
proc discard_replies_every {rd count frequency discard_num} {
if {$count % $frequency != 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}
proc discard_replies_every {rd count frequency discard_num} {
if {$count % $frequency == 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}
🤖 Prompt for AI Agents
In `@tests/unit/memefficiency.tcl` around lines 70 - 76, The condition in proc
discard_replies_every is inverted causing reads to occur on all non-multiples
and potentially block; change the modulo check to trigger discarding only when
count % frequency == 0 (i.e., replace "if {$count % $frequency != 0}" with "if
{$count % $frequency == 0}") so the for-loop that calls "$rd read" runs only on
the intended periodic iterations and avoids exhausting replies/blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants