Summary
Setting DB_POOL_SIZE=15 and DB_MAX_OVERFLOW=30 results in the gateway establishing ~200 database connections instead of the expected ~45. This is because each gunicorn worker gets its own independent connection pool, and additionally the observability pattern opens many simultaneous sessions per request.
Root Cause
Two compounding factors make the actual connection count unintuitive:
1. Per-worker pool multiplication (architecture)
With gunicorn multi-worker deployment:
- Each forked worker gets its own independent copy of the SQLAlchemy
QueuePool (via OS copy-on-write semantics after fork)
- Pool size limits are enforced per-worker, not per-gateway
run-gunicorn.sh auto-detects workers: min(2×CPU+1, 16)
Actual max connections = GUNICORN_WORKERS × (DB_POOL_SIZE + DB_MAX_OVERFLOW)
| Workers |
DB_POOL_SIZE |
DB_MAX_OVERFLOW |
Expected (intuitive) |
Actual Max |
| 4 |
15 |
30 |
45 |
180 |
| 5 |
15 |
30 |
45 |
225 |
| 16 |
15 |
30 |
45 |
720 |
2. Observability session proliferation (amplifier)
When OBSERVABILITY_ENABLED=true, each traced request creates 4-6 independent sessions via _get_or_create_observability_session() in observability_service.py:
start_trace() → 1 session
start_span() → 1 session
end_span() → 1 session
end_trace() → 1 session
- Optional:
add_event(), record_metric() → 1-2 more
Additionally, SQL instrumentation (instrumentation/sqlalchemy.py) opens 3 sessions per SQL query (start_span + end_span + duration update). A single request running 10 queries → 30 additional sessions from instrumentation alone.
Under even modest concurrency (3-5 concurrent requests per worker), a single worker can saturate all 15 pool connections and eat through all 30 overflow connections.
3. Missing engine.dispose() in post_fork (minor)
gunicorn.config.py:post_fork() resets Redis but does NOT call engine.dispose(). When preload_app=True (Linux default), the parent process creates pool connections that become invalid in children. SQLAlchemy handles this gracefully (invalidates stale connections by PID check), but calling dispose() would be cleaner and follow SQLAlchemy's official multi-process guidance.
Reproduction
# Set small pool size
export DB_POOL_SIZE=15
export DB_MAX_OVERFLOW=30
export OBSERVABILITY_ENABLED=true
# Start with multiple workers
make serve # auto-detects workers
# Check PostgreSQL connection count
psql -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE '%mcpgateway%';"
# Shows ~150-200+ instead of expected 45
Proposed Fixes
-
Add startup log warning showing the actual max connection formula:
WARNING: Total max DB connections = workers(N) × (pool_size + max_overflow) = N × 45 = M
Ensure PostgreSQL max_connections >= M
-
Reduce observability session proliferation: Batch start_trace/start_span/end_span/end_trace into fewer sessions (reuse a single session per observability lifecycle)
-
Fix SQL instrumentation: _write_span_to_db() opens 3 sessions — consolidate into 1
-
Add engine.dispose() in post_fork for clean pool separation per worker (SQLAlchemy best practice)
-
Consider a dedicated smaller pool for observability writes or rate-limit observability session creation when pool is near capacity
Affected Files
mcpgateway/config.py:1920-1921 — default pool size settings
mcpgateway/db.py:226 — module-level engine creation
mcpgateway/services/observability_service.py:192-218 — independent session creation
mcpgateway/instrumentation/sqlalchemy.py:43-88 — 3 sessions per SQL query span
gunicorn.config.py:118-128 — post_fork missing engine.dispose()
run-gunicorn.sh:235-248 — worker count auto-detection
Environment
- SQLAlchemy >= 2.0.49
- Gunicorn with UvicornWorker
- PostgreSQL backend with QueuePool (non-PgBouncer)
Summary
Setting
DB_POOL_SIZE=15andDB_MAX_OVERFLOW=30results in the gateway establishing ~200 database connections instead of the expected ~45. This is because each gunicorn worker gets its own independent connection pool, and additionally the observability pattern opens many simultaneous sessions per request.Root Cause
Two compounding factors make the actual connection count unintuitive:
1. Per-worker pool multiplication (architecture)
With gunicorn multi-worker deployment:
QueuePool(via OS copy-on-write semantics after fork)run-gunicorn.shauto-detects workers:min(2×CPU+1, 16)Actual max connections =
GUNICORN_WORKERS × (DB_POOL_SIZE + DB_MAX_OVERFLOW)2. Observability session proliferation (amplifier)
When
OBSERVABILITY_ENABLED=true, each traced request creates 4-6 independent sessions via_get_or_create_observability_session()inobservability_service.py:start_trace()→ 1 sessionstart_span()→ 1 sessionend_span()→ 1 sessionend_trace()→ 1 sessionadd_event(),record_metric()→ 1-2 moreAdditionally, SQL instrumentation (
instrumentation/sqlalchemy.py) opens 3 sessions per SQL query (start_span + end_span + duration update). A single request running 10 queries → 30 additional sessions from instrumentation alone.Under even modest concurrency (3-5 concurrent requests per worker), a single worker can saturate all 15 pool connections and eat through all 30 overflow connections.
3. Missing
engine.dispose()inpost_fork(minor)gunicorn.config.py:post_fork()resets Redis but does NOT callengine.dispose(). Whenpreload_app=True(Linux default), the parent process creates pool connections that become invalid in children. SQLAlchemy handles this gracefully (invalidates stale connections by PID check), but callingdispose()would be cleaner and follow SQLAlchemy's official multi-process guidance.Reproduction
Proposed Fixes
Add startup log warning showing the actual max connection formula:
Reduce observability session proliferation: Batch
start_trace/start_span/end_span/end_traceinto fewer sessions (reuse a single session per observability lifecycle)Fix SQL instrumentation:
_write_span_to_db()opens 3 sessions — consolidate into 1Add
engine.dispose()inpost_forkfor clean pool separation per worker (SQLAlchemy best practice)Consider a dedicated smaller pool for observability writes or rate-limit observability session creation when pool is near capacity
Affected Files
mcpgateway/config.py:1920-1921— default pool size settingsmcpgateway/db.py:226— module-level engine creationmcpgateway/services/observability_service.py:192-218— independent session creationmcpgateway/instrumentation/sqlalchemy.py:43-88— 3 sessions per SQL query spangunicorn.config.py:118-128—post_forkmissingengine.dispose()run-gunicorn.sh:235-248— worker count auto-detectionEnvironment