Skip to content

Consumers start get null at .consume() without an obvious error #2476

@Skyair59

Description

@Skyair59

I see the following floating error.
There are several different applications on .Net, using confluent-kafka-dotnet (different versions).
From time to time (maybe several times a day, maybe once a week), consumers stop reading messages from Kafka.
The consumer does not throw any exception, but simply returns null when the method is called.consume().
All the consumers in the consumer group stop reading at once.
On the broker's side, consumers leave the group and it switches to the EMPTY status.
Applications are deployed as pods in K8S.
Only monitoring the lag and manually restarting the hearth helps.

What we tried and didn't help:
-Updating brokers and client libraries
-Changing session timeouts (6-30 seconds)
-Using a static member id
-Reading only one topic by one consumer group
-Using CancellationToken and with TimeSpan for .consume().

I tried to reproduce the network problems of running Kafka and the consumer locally in a docker container. I turned off the network on the docker's side. There were error messages in the consumer logs, and after the network was restored, the reading continued.

I don't see this problem in other programming languages.

Checklist:

  • Apache Kafka 2.6.3 without authorization on virtual machines and Strimzi Kafka in k8s with keycloak authorization
  • confluent-kafka-dotnet 2.8.0 and newer
  • k8s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions