Problem
When the context passed to startSubscriber is cancelled, messages that are in the process of being pushed to the t.incoming channel are silently dropped without calling Ack() or Nack() on them.
This causes the Pub/Sub server to wait for the full ack_deadline (default 10s, configurable up to 600s) before redelivering the message, even though the subscriber has already terminated and will never process it.
Current Behavior
In protocol.go, the startSubscriber function contains:
return conn.Receive(ctx, func(ctx context.Context, m *pubsub.Message) {
select {
case t.incoming <- *m:
case <-ctx.Done():
// Message is silently dropped - no Ack() or Nack() called
}
})
When ctx.Done() is selected:
- The message
m is not sent to t.incoming
- Neither
m.Ack() nor m.Nack() is called
- The Pub/Sub server considers the message "in flight" until
ack_deadline expires
- With
exactly_once_delivery = true, the message is strictly locked for the entire ack_deadline duration
Impact
This behavior causes significant message delivery delays in the following scenario:
- Application receives messages from Pub/Sub via Cloud Events SDK
- Context is cancelled (e.g., graceful shutdown, stream timeout, HTTP server timeout)
- Any messages currently in the
conn.Receive callback are dropped without Nack
- Pub/Sub server waits for
ack_deadline (e.g., 300 seconds) before redelivering
- If the application restarts quickly, it cannot receive these messages until
ack_deadline expires
- With exponential backoff configured, redelivery may be delayed even further
In our production environment, this caused messages to be delayed for approximately 1 hour when combined with:
ack_deadline_seconds = 300 (5 minutes)
exactly_once_delivery = true
retry_policy with exponential backoff (30s to 600s)
- Short-lived stream connections (~60-120 seconds)
Expected Behavior
When context is cancelled, messages should be explicitly Nack'd to allow immediate redelivery:
return conn.Receive(ctx, func(ctx context.Context, m *pubsub.Message) {
select {
case t.incoming <- *m:
case <-ctx.Done():
m.Nack() // Explicitly Nack to allow immediate redelivery
}
})
Environment
- Cloud Events SDK version: v2.12.0 (also affects current main)
- Go version: 1.21+
- Google Cloud Pub/Sub with
exactly_once_delivery = true
Workaround
Currently, the only workarounds are:
- Set a very short
ack_deadline_seconds (e.g., 30 seconds or less)
- Avoid using
exactly_once_delivery = true
- Use the Google Cloud Pub/Sub SDK directly instead of Cloud Events SDK
None of these are ideal solutions.
Related
Problem
When the context passed to
startSubscriberis cancelled, messages that are in the process of being pushed to thet.incomingchannel are silently dropped without callingAck()orNack()on them.This causes the Pub/Sub server to wait for the full
ack_deadline(default 10s, configurable up to 600s) before redelivering the message, even though the subscriber has already terminated and will never process it.Current Behavior
In
protocol.go, thestartSubscriberfunction contains:When
ctx.Done()is selected:mis not sent tot.incomingm.Ack()norm.Nack()is calledack_deadlineexpiresexactly_once_delivery = true, the message is strictly locked for the entireack_deadlinedurationImpact
This behavior causes significant message delivery delays in the following scenario:
conn.Receivecallback are dropped without Nackack_deadline(e.g., 300 seconds) before redeliveringack_deadlineexpiresIn our production environment, this caused messages to be delayed for approximately 1 hour when combined with:
ack_deadline_seconds = 300(5 minutes)exactly_once_delivery = trueretry_policywith exponential backoff (30s to 600s)Expected Behavior
When context is cancelled, messages should be explicitly Nack'd to allow immediate redelivery:
Environment
exactly_once_delivery = trueWorkaround
Currently, the only workarounds are:
ack_deadline_seconds(e.g., 30 seconds or less)exactly_once_delivery = trueNone of these are ideal solutions.
Related