Skip to content

Concurrently write ConsumerWriters on error #4322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shaan420
Copy link
Contributor

@shaan420 shaan420 commented Feb 4, 2025

What this PR does / why we need it:
When writing to a ConsumerService, m3msg producer randomly picks a ConsumerWriter that writes to a replica. It will block until there is a success or an error. In certain cases such as deployment of the ConsumerService, the error path induces an very high latency (25s+). This creates a huge backlog in the m3aggregator message queue and drastically increases the consume latency of the messages. In order to minimize the impact of these errors, this PR waits for a configurable amount of time on a write to return. If it doesn't then it opportunistically starts writing the message to another random replica concurrently. If this succeeds and the message is acked, the subsequent writes will detect that the stalled ConsumerWriter is still active and will skip over it to go straight to another ConsumerWriter. As soon as the connection stability returns, the m3msg producer will go back to writing to one replica.

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant