Redis 8.2.1#28
Conversation
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276) This bug was introduced by redis#14130 found by @oranagra ### Summary Because `s->cgroup_ref` is created at runtime the first time a consumer group is linked with a message, but it is not released when all references are removed. However, after `debug reload` or restart, if the PEL is empty (meaning no consumer group is referencing any message), `s->cgroup_ref` will not be recreated. As a result, when executing XADD or XTRIM with `ACKED` option and checking whether a message that is being read but has not been ACKed can be deleted, the cgroup_ref being NULL will cause a crash. ### Code Path ``` xaddCommand -> streamTrim -> streamEntryIsReferenced ``` ### Solution Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
Code Review by Qodo
1. orig_active_defrag unused variable
|
| int orig_active_defrag = server.active_defrag_enabled; | ||
| server.active_defrag_enabled = 0; | ||
|
|
||
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | ||
|
|
||
| /* Restore the original active defragmentation setting. */ | ||
| server.active_defrag_enabled = 1; |
There was a problem hiding this comment.
1. orig_active_defrag unused variable 📘 Rule violation ⛯ Reliability
• orig_active_defrag is introduced but never referenced, which will typically trigger an -Wunused-variable warning. • With -Werror enabled, this warning becomes a build failure, violating the requirement to compile cleanly with warnings-as-errors. • This also indicates the intended “restore original setting” behavior is not implemented as written in the comment.
Agent prompt
## Issue description
A new local variable `orig_active_defrag` is assigned but never used, which will likely trigger `-Wunused-variable` and fail builds when compiled with `-Werror`.
## Issue Context
The comment says the original active defrag setting should be restored after `emptyData(...)`, but the code restores a hard-coded `1` instead. This both causes the unused variable warning and makes the restore behavior inconsistent with the comment.
## Fix Focus Areas
- src/replication.c[1953-1962]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| /* Version information */ | ||
| #define REDIS_VERSION "8.2.1" | ||
| #define REDIS_VERSION_NUM 0x00080201 |
There was a problem hiding this comment.
2. version.h missing include guards 📘 Rule violation ✓ Correctness
• src/version.h is a header file but contains no include guard at all, so multiple inclusion can lead to redefinition errors depending on how it’s used. • The compliance requirement mandates double-underscore include guards with the __FILENAME_H pattern, which this file does not implement.
Agent prompt
## Issue description
`src/version.h` lacks the required double-underscore include guards for header files.
## Issue Context
The compliance checklist requires all `.h` files to use include guards of the form `#ifndef __FILENAME_H` / `#define __FILENAME_H` / `#endif`.
## Fix Focus Areas
- src/version.h[1-3]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| /* Check if the message is in any consumer group's PEL */ | ||
| if (!s->cgroups_ref) return 1; | ||
| unsigned char buf[sizeof(streamID)]; | ||
| streamEncodeID(buf, id); | ||
| return raxFind(s->cgroups_ref, buf, sizeof(streamID), NULL); |
There was a problem hiding this comment.
3. Acked trim treated referenced 🐞 Bug ✓ Correctness
• streamEntryIsReferenced() returns 1 (referenced) when s->cgroups_ref is NULL, which is the opposite of what the PEL-reference check should imply. • When consumer groups exist but the PEL is empty (common after ACK + reload), cgroups_ref is intentionally NULL, so this change makes fully-ACKed entries look referenced forever. • This breaks ACKED delete/trim semantics (XADD/XTRIM/XDELEX/XACKDEL), causing trims to fail and streams to grow beyond MAXLEN/MINID expectations.
Agent prompt
### Issue description
`streamEntryIsReferenced()` incorrectly returns `1` (referenced) when `s->cgroups_ref` is NULL. Since `cgroups_ref` is only built when PEL entries exist, NULL should mean “no PEL references exist”, so the function should not treat entries as referenced.
### Issue Context
This function is used by stream trimming/deletion paths under `DELETE_STRATEGY_ACKED` (e.g. XADD MAXLEN/MINID with ACKED), to decide which entries are eligible for removal.
### Fix Focus Areas
- src/t_stream.c[2707-2711]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| proc discard_replies_every {rd count frequency discard_num} { | ||
| if {$count % $frequency != 0} { | ||
| for {set k 0} {$k < $discard_num} {incr k} { | ||
| $rd read ; # Discard replies | ||
| } | ||
| } |
There was a problem hiding this comment.
4. Reply discard logic inverted 🐞 Bug ✓ Correctness
• The new discard_replies_every helper discards replies when $count % $frequency != 0, which is inverted from the intended “every N commands” behavior. • In loops that call it every iteration, it will attempt to $rd read thousands of replies after only a handful of commands were pipelined, likely blocking the client waiting for replies that were never enqueued. • This can cause test timeouts/hangs across multiple defrag tests (including the newly added replication/defrag test).
Agent prompt
### Issue description
The helper `discard_replies_every` uses an inverted condition (`!= 0`), causing it to read/discard replies on almost every loop iteration. With `redis_deferring_client`, this can block waiting for replies that are not pending.
### Issue Context
Several tests pipeline commands in tight loops and periodically drain replies to avoid excessive buffering. The previous pattern was “every 10000 ops, read 10000 replies”.
### Fix Focus Areas
- tests/unit/memefficiency.tcl[70-76]
- tests/unit/memefficiency.tcl[345-373]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Benchmark PR from agentic-review-benchmarks#6