-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NIFI-15226 Allow ConsumeKafka to use static partition mapping #10538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NIFI-15226 Allow ConsumeKafka to use static partition mapping #10538
Conversation
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.nifi</groupId> | ||
| <artifactId>nifi-framework-nar-utils</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency appears to be unused, can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.nifi.mock.MockComponentLogger is referenced in TestConsumerPartitionsUtil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. In that case, this dependency should be removed and MockComponentLogger should be replaced with MockComponentLog from nifi-mock to avoid referencing framework modules in extension modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preserved this class from the older change. But actually we don't need either. We can just mock the logger with Mockito.
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial build fails on the following tests:
Error: ConsumeKafkaTest.testVerifyFailed:116 » NullPointer Cannot invoke "org.apache.nifi.kafka.service.api.consumer.PollingContext.getTopics()" because "this.pollingContext" is null
Error: ConsumeKafkaTest.testVerifySuccessful:99 » NullPointer Cannot invoke "org.apache.nifi.kafka.service.api.consumer.PollingContext.getTopics()" because "this.pollingContext" is null
…nstead of just having it commented out)
| consumerServiceToPartitionedPollingContext.clear(); | ||
|
|
||
| final boolean isExplicitPartitionMapping = ConsumerPartitionsUtil.isPartitionAssignmentExplicit(context.getAllProperties()); | ||
|
|
||
| if (isExplicitPartitionMapping) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment of the partitions must happen exactly once after the processor is started.
This code is called from onTrigger() which can cause concurrency issues when Concurrent Tasks > 1:
- Multiple threads are trying to initialize the consumers in parallel
- If there are configured more threads than partitions, then a thread can reinitalize the assignment if there is no available consumer in the pool
Ideally, the assigment should happen in onScheduled(). Isn't it possible?
Also, consumerServiceToPartitionedPollingContext should be cleared in onStopped().
| return connectionService.getConsumerService(pollingContext); | ||
| } | ||
|
|
||
| public int getPartitionCount(final KafkaConnectionService connectionService) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
| public int getPartitionCount(final KafkaConnectionService connectionService) { | |
| private int getPartitionCount(final KafkaConnectionService connectionService) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of static partition mapping, the consumers should be created eagerly for each partition in availablePartitionedPollingContexts (similar to recreateAssignedConsumers() in the old 2.6 implementation).
Without that, a consumer is created for the first partition, and additional consumers are created for the other partitions only when parallel task execution triggers it. This is because a consumer is always available in the pool until multiple threads request consumers in parallel (which is not guaranteed to happen, e.g. Concurrent Tasks = 1).
| private final AtomicInteger activeConsumerCount = new AtomicInteger(); | ||
|
|
||
| private final Queue<PollingContext> availablePartitionedPollingContexts = new LinkedBlockingQueue<>(); | ||
| private final Map<KafkaConsumerService, PollingContext> consumerServiceToPartitionedPollingContext = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map can be accessed by parallel threads so it should be ConcurrentHashMap or similar.
| KafkaConsumerService service; | ||
| while ((service = consumerServices.poll()) != null) { | ||
| close(service, "Not all partitions are assigned"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can never happen because no consumers have been created so far.
| Integer partition = pollingContext.getPartition(); | ||
| if (partition != null) { | ||
| List<TopicPartition> topicPartitions = pollingContext.getTopics().stream() | ||
| .map(topic -> new TopicPartition(topic, partition)) | ||
| .collect(Collectors.toList()); | ||
| consumer.assign(topicPartitions); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving it to Kafka3ConsumerService's constructor. That way, the assign/subscribe logic would be more straightforward (a single if-else instead of the separate if statements at different points in the code).
The partition id can be passed in the Subscription parameter.
Summary
NIFI-15226
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation