[BUG]: DB_POOL_SIZE and DB_MAX_OVERFLOW are per-worker, not per-gateway — actual connections = workers × (pool_size + max_overflow)

## Summary

Setting `DB_POOL_SIZE=15` and `DB_MAX_OVERFLOW=30` results in the gateway establishing ~200 database connections instead of the expected ~45. This is because each gunicorn worker gets its own independent connection pool, and additionally the observability pattern opens many simultaneous sessions per request.

## Root Cause

Two compounding factors make the actual connection count unintuitive:

### 1. Per-worker pool multiplication (architecture)

With gunicorn multi-worker deployment:
- Each forked worker gets its own **independent** copy of the SQLAlchemy `QueuePool` (via OS copy-on-write semantics after fork)
- Pool size limits are enforced **per-worker**, not per-gateway
- `run-gunicorn.sh` auto-detects workers: `min(2×CPU+1, 16)`

**Actual max connections = `GUNICORN_WORKERS × (DB_POOL_SIZE + DB_MAX_OVERFLOW)`**

| Workers | DB_POOL_SIZE | DB_MAX_OVERFLOW | Expected (intuitive) | Actual Max |
|---------|-------------|-----------------|---------------------|------------|
| 4 | 15 | 30 | 45 | **180** |
| 5 | 15 | 30 | 45 | **225** |
| 16 | 15 | 30 | 45 | **720** |

### 2. Observability session proliferation (amplifier)

When `OBSERVABILITY_ENABLED=true`, each traced request creates **4-6 independent sessions** via `_get_or_create_observability_session()` in `observability_service.py`:
- `start_trace()` → 1 session
- `start_span()` → 1 session
- `end_span()` → 1 session
- `end_trace()` → 1 session
- Optional: `add_event()`, `record_metric()` → 1-2 more

Additionally, SQL instrumentation (`instrumentation/sqlalchemy.py`) opens **3 sessions per SQL query** (start_span + end_span + duration update). A single request running 10 queries → 30 additional sessions from instrumentation alone.

Under even modest concurrency (3-5 concurrent requests per worker), a single worker can saturate all 15 pool connections and eat through all 30 overflow connections.

### 3. Missing `engine.dispose()` in `post_fork` (minor)

`gunicorn.config.py:post_fork()` resets Redis but does NOT call `engine.dispose()`. When `preload_app=True` (Linux default), the parent process creates pool connections that become invalid in children. SQLAlchemy handles this gracefully (invalidates stale connections by PID check), but calling `dispose()` would be cleaner and follow SQLAlchemy's official multi-process guidance.

## Reproduction

```bash
# Set small pool size
export DB_POOL_SIZE=15
export DB_MAX_OVERFLOW=30
export OBSERVABILITY_ENABLED=true

# Start with multiple workers
make serve  # auto-detects workers

# Check PostgreSQL connection count
psql -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE '%mcpgateway%';"
# Shows ~150-200+ instead of expected 45
```

## Proposed Fixes

1. **Add startup log warning** showing the actual max connection formula:
   ```
   WARNING: Total max DB connections = workers(N) × (pool_size + max_overflow) = N × 45 = M
   Ensure PostgreSQL max_connections >= M
   ```

2. **Reduce observability session proliferation**: Batch `start_trace`/`start_span`/`end_span`/`end_trace` into fewer sessions (reuse a single session per observability lifecycle)

3. **Fix SQL instrumentation**: `_write_span_to_db()` opens 3 sessions — consolidate into 1

4. **Add `engine.dispose()` in `post_fork`** for clean pool separation per worker (SQLAlchemy best practice)

5. **Consider a dedicated smaller pool for observability writes** or rate-limit observability session creation when pool is near capacity

## Affected Files

- `mcpgateway/config.py:1920-1921` — default pool size settings
- `mcpgateway/db.py:226` — module-level engine creation
- `mcpgateway/services/observability_service.py:192-218` — independent session creation
- `mcpgateway/instrumentation/sqlalchemy.py:43-88` — 3 sessions per SQL query span
- `gunicorn.config.py:118-128` — `post_fork` missing `engine.dispose()`
- `run-gunicorn.sh:235-248` — worker count auto-detection

## Environment

- SQLAlchemy >= 2.0.49
- Gunicorn with UvicornWorker
- PostgreSQL backend with QueuePool (non-PgBouncer)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: DB_POOL_SIZE and DB_MAX_OVERFLOW are per-worker, not per-gateway — actual connections = workers × (pool_size + max_overflow) #4645

Summary

Root Cause

1. Per-worker pool multiplication (architecture)

2. Observability session proliferation (amplifier)

3. Missing `engine.dispose()` in `post_fork` (minor)

Reproduction

Proposed Fixes

Affected Files

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG]: DB_POOL_SIZE and DB_MAX_OVERFLOW are per-worker, not per-gateway — actual connections = workers × (pool_size + max_overflow) #4645

Description

Summary

Root Cause

1. Per-worker pool multiplication (architecture)

2. Observability session proliferation (amplifier)

3. Missing engine.dispose() in post_fork (minor)

Reproduction

Proposed Fixes

Affected Files

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

3. Missing `engine.dispose()` in `post_fork` (minor)