Skip to content

[automatic failover] Adding failover tests for all unhealthy#4449

Open
atakavci wants to merge 2 commits into
redis:masterfrom
atakavci:failover/recoveryTests
Open

[automatic failover] Adding failover tests for all unhealthy#4449
atakavci wants to merge 2 commits into
redis:masterfrom
atakavci:failover/recoveryTests

Conversation

@atakavci

@atakavci atakavci commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Adding tests for recovery cases that all endpoints gets unhealthy


Note

Low Risk
Test-only changes; main risk is increased integration test flakiness due to timing/awaitility-based assertions.

Overview
Adds two new integration tests in MultiDbConnectionProviderTest covering recovery after the provider enters JedisPermanentlyNotAvailableException when all databases become disabled/unhealthy.

The new cases assert (1) with failbackSupported(true) the client automatically fails back to the higher-weight database once it is re-enabled, and (2) with failbackSupported(false) recovery requires manual intervention (setActiveDatabase) even after a database is made healthy again.

Written by Cursor Bugbot for commit d2c02d4. This will update automatically on new commits. Configure here.

}

@Test
public void recoveryAfterPermanentException_withFailbackDisabled() {

@ggivo ggivo Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, don't like such tests, and will go without it
.failbackSupported(false), but then we wait 1s to ensure no failback, which is non-deterministic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggivo how about with removing sleep(1)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

// The system knows there's a healthy database available (db1), but the active database
// is still the disabled one (db0), so commands will fail with JedisConnectionException
// (not JedisPermanentlyNotAvailableException, because canIterateFrom returns true now)
assertThrows(JedisConnectionException.class, () -> jedis.get("test-key"));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak assertion doesn't verify exact exception type

Medium Severity

The assertThrows(JedisConnectionException.class, ...) at this line doesn't verify the exception is specifically JedisConnectionException and not JedisPermanentlyNotAvailableException, since the latter is a subclass of the former. The comment on lines 429-432 explicitly states the exception is expected to be JedisConnectionException (not JedisPermanentlyNotAvailableException), but the assertion would pass for either. Existing tests in this file (lines 293-294, 296-297) follow the pattern of using assertEquals(JedisConnectionException.class, e.getClass()) to verify the exact type. Without that check, this test cannot detect a regression where the permanent failure state isn't properly reset after re-enabling a database.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants