Skip to content

fix(server): preserve existing connections after requirepass changes#3443

Open
aviralgarg05 wants to merge 7 commits intoapache:unstablefrom
aviralgarg05:fix-3267-sentinel-existing-connections
Open

fix(server): preserve existing connections after requirepass changes#3443
aviralgarg05 wants to merge 7 commits intoapache:unstablefrom
aviralgarg05:fix-3267-sentinel-existing-connections

Conversation

@aviralgarg05
Copy link
Copy Markdown

Description

This fixes a Redis Sentinel compatibility issue reported in #3267.

When Kvrocks accepted a connection while requirepass was still empty, it did not immediately mark that connection as authenticated. Instead, it deferred assigning the default namespace/admin context until the connection sent its first command.

That differs from Redis behavior.

In practice, this created a race with older Sentinel versions. Sentinel could establish a connection to the master before auth-pass was configured, but only send its first probe afterward. Redis continues to allow that already-open connection to operate, while Kvrocks responded with NOAUTH. As a result, Sentinel could temporarily treat the master as down and fail to discover replicas until a reset or restart forced a fresh connection cycle.

This change fixes the behavior at the source:

  • Passwordless connections are now initialized as authenticated when they are accepted.
  • Existing connections opened before requirepass is later set continue to behave like Redis.
  • New connections created after requirepass is set still require authentication as expected.

Files changed

Why this is safe

The change is limited to the connection initialization path and only affects connections created while the server is running without a password configured.

It does not relax authentication for:

  • servers that already start with requirepass set
  • connections created after requirepass is enabled

So the fix restores Redis-compatible semantics without changing the normal protected path.

Test coverage

Added a regression test that verifies:

  1. a connection opened before CONFIG SET requirepass ... remains usable
  2. a new connection opened after requirepass is set receives NOAUTH

Validation

Ran:

  • ./x.py build -j 8
  • ./x.py format
  • ./x.py check format
  • go test -timeout=1800s -run 'TestNoAuth|TestAuth' ./unit/auth -binPath=/Users/aviralgarg/Everything/kvrocks/build/kvrocks -cliPath=redis-cli -workspace=/Users/aviralgarg/Everything/kvrocks/tests/gocase/workspace

The auth test suite was run 3 times and passed each time.

I also manually verified the repro condition:

  • before the fix, an idle connection created before requirepass was enabled received -NOAUTH
  • after the fix, the same connection correctly receives +PONG

Notes

Two repo-wide checks are currently blocked by the local environment rather than this change:

  • ./x.py check tidy
    • after wiring run-clang-tidy from local LLVM, it reports existing unrelated tidy failures outside this patch
  • ./x.py check golangci-lint
    • local golangci-lint is 2.8.0, while the repo requires 2.9.0+

Comment thread src/server/redis_connection.cc Outdated
@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented Apr 17, 2026

It looks like this crash is 100% reproducible. Could you take a look? @aviralgarg05

@sonarqubecloud
Copy link
Copy Markdown

@jihuayu jihuayu requested a review from git-hulk April 21, 2026 01:18
Copy link
Copy Markdown
Member

@jihuayu jihuayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I do have one small question, though.

return slave.LogFileMatches(t, ".*skip count: 1.*")
}, 50*time.Second, 1000*time.Millisecond)
util.WaitForSync(t, slaveClient)
util.WaitForOffsetSync(t, masterClient, slaveClient, 50*time.Second)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to change it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitForSync only checks master_link_status == up, which is weaker than what this test needs.

In this case we restart the master after a partial SST transfer, so the replica link can come back before the resumed transfer has fully caught up. That was making the test flaky on slower matrix variants, especially -DENABLE_LUAJIT=OFF.

WaitForOffsetSync waits for the master and replica offsets to match, which is the actual condition we care about before checking the replicated key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants