Observed behavior
When a JetStream consumer receives ErrConsumerLeadershipChanged (status code 409, "leadership change"), the client resets pending message counts but does not send a new pull request. This leaves the consumer in a state where:
- The connection is healthy
- The subscription is valid
- Messages are pending on the server (
NumPending > 0)
- The consumer is not receiving any messages
The consumer remains stalled indefinitely until the application is restarted.
In our production with JetStream cluster, leadership changes occur quite often which let us to spot the issue. For us this caused a few message processing delays.
Expected behavior
When ErrConsumerLeadershipChanged is received, the client should immediately send a new pull request to resume message delivery, similar to how ErrNoHeartbeat is handled.
Current code:
if errors.Is(msgErr, ErrConsumerLeadershipChanged) {
s.pending.msgCount = 0
s.pending.byteCount = 0
}
return nil // Does NOT trigger new pull request
Expected behavior (similar to ErrNoHeartbeat handling at line 394-414):
if errors.Is(msgErr, ErrConsumerLeadershipChanged) {
s.pending.msgCount = 0
s.pending.byteCount = 0
// Should trigger a new pull request here
s.fetchNext <- &pullRequest{
Expires: s.consumeOpts.Expires,
Batch: s.consumeOpts.MaxMessages,
MaxBytes: s.consumeOpts.MaxBytes,
Heartbeat: s.consumeOpts.Heartbeat,
// ... other fields
}
if s.hbMonitor != nil {
s.hbMonitor.Reset(2 * s.consumeOpts.Heartbeat)
}
}
Server and client version
- nats.go: v1.51.0
- nats-server: v2.12.4, clustered JetStream, 3 replicas, file storage
Host environment
- Linux/amd64, Go 1.26.2
- GKE, consumers run as long-lived pods
Steps to reproduce
- 3-node JetStream cluster, stream + durable pull consumer (R=3).
- Client calls
consumer.Consume(handler) with default Heartbeat.
- Publish a steady stream of messages.
- Force a leadership change:
nats consumer cluster step-down STREAM CONSUMER.
- Observe: handler stops being called;
consumer.Info().NumPending climbs; no errors on the connection.
Triggers in production: cluster scale events, partial network blips - leadership changes here happen without a client-side disconnect, so reconnection logic doesn't help.
Observed behavior
When a JetStream consumer receives
ErrConsumerLeadershipChanged(status code 409, "leadership change"), the client resets pending message counts but does not send a new pull request. This leaves the consumer in a state where:NumPending > 0)The consumer remains stalled indefinitely until the application is restarted.
In our production with JetStream cluster, leadership changes occur quite often which let us to spot the issue. For us this caused a few message processing delays.
Expected behavior
When
ErrConsumerLeadershipChangedis received, the client should immediately send a new pull request to resume message delivery, similar to howErrNoHeartbeatis handled.Current code:
Expected behavior (similar to ErrNoHeartbeat handling at line 394-414):
Server and client version
Host environment
Steps to reproduce
consumer.Consume(handler)with defaultHeartbeat.nats consumer cluster step-down STREAM CONSUMER.consumer.Info().NumPendingclimbs; no errors on the connection.Triggers in production: cluster scale events, partial network blips - leadership changes here happen without a client-side disconnect, so reconnection logic doesn't help.