Skip to content

fix(core): Prevent Redis connection recovery from being missed#28256

Open
ivov wants to merge 2 commits intomasterfrom
cat-2721-fix-redis-reconnection-flag
Open

fix(core): Prevent Redis connection recovery from being missed#28256
ivov wants to merge 2 commits intomasterfrom
cat-2721-fix-redis-reconnection-flag

Conversation

@ivov
Copy link
Copy Markdown
Member

@ivov ivov commented Apr 9, 2026

Summary

RedisClientService debounces all event emissions by 1s, and the reconnection retry interval is also 1s. When a connection drops, retryStrategy queues a debounced connection-lost event. Meanwhile, ioRedis waits 1s and reconnects. So the debounce timer and the ready event race each other. Typically the debounce wins and sets lostConnection = true before ready checks it, but under load ready can fire first, so it finds lostConnection still as false and skips recovery, then the late connection-lost flips the flag to true, where it stays forever.

This PR sets lostConnection = true synchronously in retryStrategy to bypass the debounce. The debounced listener now only logs.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/CAT-2721
Closes #27753

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with Backport to Beta, Backport to Stable, or Backport to v1 (if the PR is an urgent fix that needs to be backported)

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant Redis as ioRedis Library
    participant RCS as RedisClientService
    participant Emitter as Internal Emitter (Debounced)

    Note over Redis, RCS: Connection Failure
    Redis->>RCS: retryStrategy() callback invoked

    rect rgb(240, 240, 240)
        Note right of RCS: NEW: Atomic State Update
        RCS->>RCS: Set lostConnection = true
    end

    RCS->>Emitter: emit('connection-lost', timeout)
    
    Note over Emitter: Debounce Timer Starts (1s)

    Note over Redis, RCS: Connection Re-established
    Redis->>RCS: Event: 'ready'

    alt is lostConnection == true
        Note right of RCS: Recovery Path Triggered
        RCS->>RCS: internalRecover()
        RCS->>Emitter: emit('connection-recovered')
        RCS->>RCS: Set lostConnection = false
    else CHANGED: Previous Race Condition (if ready fired before debounce)
        Note right of RCS: Recovery skipped (Bug fixed)
    end

    Note over Emitter: Debounce Timer Expires
    Emitter-->>RCS: Listener: 'connection-lost' fires
    
    rect rgb(240, 240, 240)
        Note right of RCS: CHANGED: Only logs warning now
        RCS->>RCS: logger.warn(...)
    end
Loading

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%
docker-image-size-runners 386.00 MB 386.00 MB 388.00 MB (σ 3.46) +0.0% -0.5%

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-rss-baseline 287.07 MB 287.07 MB 281.78 MB (σ 34.50) +0.0% +1.9%
memory-heap-used-baseline 114.53 MB 114.53 MB 113.09 MB (σ 1.15) +0.0% +1.3% ⚠️

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 186.41 MB 186.41 MB 186.41 MB (< 3 samples) +0.0% +0.0%
instance-ai-rss-baseline 343.76 MB 343.76 MB 343.76 MB (< 3 samples) +0.0% -0.0%
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

@ivov ivov requested a review from despairblue April 9, 2026 14:21
@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis connection sometimes not recovered

1 participant