[KAFKA-15580] First attempt at UncleanRecoveryManager #19468
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basic architecture for system to manage UncleanRecovery by fetching
log information from brokers.
The idea is to improve durability by electing the leader with longest during unclean recovery instead of just picking a random unfenced broker.
There will be one additional thread - RecoverySendThread which handles NIO between controller and broker - sending requests. LogInfoResponseReceived events will be written to the controllers queue. These events don't write anything until the last response is received - meaning that all needed information is collected to make accurate determinations of log length.
Responses to requests are processed in the controllers event queue by a RecoveryManager class.
It keeps track of the # of outstanding requests and decides whether and how to retry failed requests. RecoveryManager builds up a "LogInfoStore" object with information about log length of various replicas.
Either after all expected requests are received or a timeout; the RecoveryManager will begin to run leadership elections for batches of TopicPartitions (respecting the max batch size configuration). These elections will have access to the store which contains enriched log information to assist them in making leadership decisions.
Most of the tracking is performed in a class called "StateMachine" within RecoveryManager. This is deliberate so that eventually the RecoveryManager can supervise multiple unclean election requests at the same time.
requests