-
Notifications
You must be signed in to change notification settings - Fork 544
Description
Problem
We're experiencing high consumption of the "Exactly once subscriber pull operations per minute per region" GCP Pub/Sub quota due to excessive idle streaming pull connections. Our quota usage is primarily driven by connection maintenance overhead rather than actual message processing.
Misleading Configuration
MaxConcurrency is used to set GCP's MaxOutstandingMessages, but it does NOT control GCP's NumGoroutines (the number of streaming pull connections). This naming/behavior is confusing and led us to believe we could control connection count via MaxConcurrency. We figured it out now.
Default Behavior Creates Too Many Connections
With the default GCP Pub/Sub client settings:
- NumGoroutines = 10 per subscription per instance per connection
- Each idle connection generates ~55 operations/min for maintenance (heartbeats?, exactly-once delivery state?)
- We have 18 subscriptions
- Each instance ~180 pull streams
Quota details:
- Quota limit: 180,000 operations/min
- Baseline usage: 55,000/min (30%) for 5 instances (min) during off-load hours
- Peak usage: >100% (we get throttled)
- Problem: >80% of operations are connection overhead, not actual message processing
Adaptive experimental feature is useless
We enabled the "adaptive-gcp-pubsub-goroutines" experimental flag, which reduced connections from ~180 per instance to ~171 per instance (~5% reduction).
This is insufficient for our needs, since it targets 90 connections per CPU, we have autoscaling to 30 instances * 2 CPUs = 60 CPUs, which will lead to ~5400 pull streams, or 180% quota usage baseline on maintaining connections alone.
Proposed Solution
Add explicit NumGoroutines configuration per subscription:
type PubsubSubscriptionConfig struct {
// ... existing fields ...
// NumGoroutines sets the number of StreamingPull connections to maintain
// for this subscription. Each goroutine maintains one persistent gRPC stream.
//
// If not set, defaults to 10 (GCP client library default).
NumGoroutines *int
}
Benefits
- Predictable quota consumption: Operators can calculate exact baseline overhead
- Right-sized for traffic: Low-traffic subscriptions don't waste connections
- Cost optimization: Could reduce idle connection overhead by 70-80%
- Clear semantics: Separate concerns (processing concurrency vs. connection count)
Workaround
Currently, we have no way to control NumGoroutines per subscription without modifying Encore runtime code. The adaptive feature doesn't provide sufficient control for our use case.
Environment
- GCP Pub/Sub with exactly-once delivery enabled
- 2 CPUs per instance
- 5-30 instances