Skip to content

pubsub: Messages are not Nack'd when context is cancelled, causing ack deadline exhaustion #1234

@kerokerogeorge

Description

@kerokerogeorge

Problem

When the context passed to startSubscriber is cancelled, messages that are in the process of being pushed to the t.incoming channel are silently dropped without calling Ack() or Nack() on them.

This causes the Pub/Sub server to wait for the full ack_deadline (default 10s, configurable up to 600s) before redelivering the message, even though the subscriber has already terminated and will never process it.

Current Behavior

In protocol.go, the startSubscriber function contains:

return conn.Receive(ctx, func(ctx context.Context, m *pubsub.Message) {
    select {
    case t.incoming <- *m:
    case <-ctx.Done():
        // Message is silently dropped - no Ack() or Nack() called
    }
})

When ctx.Done() is selected:

  1. The message m is not sent to t.incoming
  2. Neither m.Ack() nor m.Nack() is called
  3. The Pub/Sub server considers the message "in flight" until ack_deadline expires
  4. With exactly_once_delivery = true, the message is strictly locked for the entire ack_deadline duration

Impact

This behavior causes significant message delivery delays in the following scenario:

  1. Application receives messages from Pub/Sub via Cloud Events SDK
  2. Context is cancelled (e.g., graceful shutdown, stream timeout, HTTP server timeout)
  3. Any messages currently in the conn.Receive callback are dropped without Nack
  4. Pub/Sub server waits for ack_deadline (e.g., 300 seconds) before redelivering
  5. If the application restarts quickly, it cannot receive these messages until ack_deadline expires
  6. With exponential backoff configured, redelivery may be delayed even further

In our production environment, this caused messages to be delayed for approximately 1 hour when combined with:

  • ack_deadline_seconds = 300 (5 minutes)
  • exactly_once_delivery = true
  • retry_policy with exponential backoff (30s to 600s)
  • Short-lived stream connections (~60-120 seconds)

Expected Behavior

When context is cancelled, messages should be explicitly Nack'd to allow immediate redelivery:

return conn.Receive(ctx, func(ctx context.Context, m *pubsub.Message) {
    select {
    case t.incoming <- *m:
    case <-ctx.Done():
        m.Nack() // Explicitly Nack to allow immediate redelivery
    }
})

Environment

  • Cloud Events SDK version: v2.12.0 (also affects current main)
  • Go version: 1.21+
  • Google Cloud Pub/Sub with exactly_once_delivery = true

Workaround

Currently, the only workarounds are:

  1. Set a very short ack_deadline_seconds (e.g., 30 seconds or less)
  2. Avoid using exactly_once_delivery = true
  3. Use the Google Cloud Pub/Sub SDK directly instead of Cloud Events SDK

None of these are ideal solutions.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions