Skip to content

Move refresh slot to the background in the Data Path. #3851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 27, 2025

Conversation

GilboaAWS
Copy link
Collaborator

@GilboaAWS GilboaAWS commented May 13, 2025

We move the RefreshSlots logic to a background task to avoid blocking the data path during poll_recover.

Currently, when the data path detects the need to refresh slots, it blocks further processing until the refresh operation completes.

By offloading this logic to the background, the system can continue handling commands concurrently, improving overall responsiveness and throughput.

Issue link

This Pull Request is linked to issue (URL): #3946

@GilboaAWS GilboaAWS self-assigned this May 13, 2025
@GilboaAWS GilboaAWS requested a review from a team as a code owner May 13, 2025 12:19
@GilboaAWS GilboaAWS added the Core changes Used to label a PR as PR with significant changes that should trigger a full matrix tests. label May 13, 2025
@ikolomi ikolomi self-requested a review May 14, 2025 09:12
@ikolomi
Copy link
Collaborator

ikolomi commented May 14, 2025

We need a test that shows that this does indeed work

Copy link
Collaborator

@ikolomi ikolomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test that shows that this does indeed work

@GilboaAWS GilboaAWS force-pushed the move_refresh_slot_to_gb branch 3 times, most recently from c512cba to 859f0b2 Compare May 19, 2025 15:40
@GilboaAWS GilboaAWS force-pushed the move_refresh_slot_to_gb branch 3 times, most recently from ebee972 to 51c316f Compare May 22, 2025 09:57
@ikolomi ikolomi moved this to Review in Valkey-GLIDE - internal May 22, 2025
@ikolomi ikolomi modified the milestones: 2.0, 1.2 May 22, 2025
Copy link
Collaborator

@ikolomi ikolomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve pending validating full matrix tests do not introduce additional failures on top

@ikolomi ikolomi self-requested a review May 22, 2025 11:09
Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline

@GilboaAWS GilboaAWS force-pushed the move_refresh_slot_to_gb branch from 3cc9074 to 99c63a9 Compare May 25, 2025 09:26
@GilboaAWS
Copy link
Collaborator Author

Task Status Handling in poll_recover

@barshaul @ikolomi

  • Directly checks task completion status using handle.now_or_never() instead of relying on channels
  • Eliminates the complexity of retry counting while maintaining reliable recovery
  • Implements distinct handling paths for different task outcomes

Panic Flow

When a refresh task panics (indicating a potential bug), the implementation:

  • Logs the error with detailed debugging information
  • Transparently reports the panic to clients via the error channel
  • Initiates reconnection to initial nodes for a clean topology rebuild
  • TODO: Consider graceful client shutdown in future iterations as panics may indicate corrupted state

Abort Flow

  • For task abortions (intentional cancellation):
  • Recognizes aborts via join_err.is_cancelled() check
  • Treats aborts as clean shutdowns rather than errors
  • Transitions directly to PollComplete state without error propagation
  • Avoids unnecessary recovery attempts for intentionally terminated tasks

@GilboaAWS GilboaAWS force-pushed the move_refresh_slot_to_gb branch from 8998a48 to 7d99c73 Compare May 26, 2025 15:02
Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comments

@GilboaAWS GilboaAWS force-pushed the move_refresh_slot_to_gb branch from 7d99c73 to 3ada8c9 Compare May 27, 2025 08:51
@ikolomi ikolomi merged commit 10a2a41 into valkey-io:main May 27, 2025
63 checks passed
@github-project-automation github-project-automation bot moved this from Review to Done in Valkey-GLIDE - internal May 27, 2025
@GilboaAWS GilboaAWS deleted the move_refresh_slot_to_gb branch May 27, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core changes Used to label a PR as PR with significant changes that should trigger a full matrix tests.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants