You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #4696 partially addressed the database connection pool multiplication issue (#4645) by optimizing SQL instrumentation from 3 sessions to 1 per query. However, the main observability lifecycle still creates 4-6 independent sessions per traced request via _get_or_create_observability_session(), which can saturate connection pools under modest concurrency.
Impact: A traced request with 10 SQL queries now uses 14-16 sessions (down from 34-36), representing a 58% reduction. However, the observability lifecycle itself (start_trace, end_trace, start_span, end_span) still accounts for 4-6 sessions per request and was not optimized in PR #4696.
Send traced requests with moderate concurrency (3-5 concurrent requests per worker)
Monitor database connections:
psql -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE '%mcpgateway%';"
Observe that connection count approaches pool limits even with modest load due to 4-6 sessions per traced request
🤔 Expected Behavior
Each traced request should reuse a single database session for its entire observability lifecycle (start_trace → start_span → end_span → end_trace), rather than creating 4-6 independent sessions.
Target: Reduce from 4-6 sessions per traced request to 1 session per traced request.
📓 Logs / Error Output
Under load, workers may log pool saturation warnings:
WARNING: QueuePool limit of size 15 overflow 30 reached, connection timed out
Current session creation pattern in mcpgateway/services/observability_service.py:
start_trace() → creates 1 independent session
start_span() → creates 1 independent session
end_span() → creates 1 independent session
end_trace() → creates 1 independent session
Optional: add_event(), record_metric() → 1-2 more sessions
Root Cause:
The observability service uses _get_or_create_observability_session() which creates independent sessions for each operation. While PR #4696 added session reuse infrastructure via the obs_db parameter, the main observability lifecycle methods (start_trace, end_trace, start_span, end_span) were not updated to use this pattern.
Proposed Solutions:
Implement session reuse for observability lifecycle (High Priority):
Modify start_trace(), end_trace(), start_span(), end_span() to accept and reuse an obs_db session parameter
Create a single session at trace start and pass it through the entire lifecycle
Target: 4-6 sessions → 1 session per traced request
Add batching for observability operations (High Priority):
Batch multiple observability writes into a single transaction
Reduce commit overhead and session churn
Consider dedicated observability pool (Nice to Have):
Separate smaller connection pool for observability writes
Prevents observability from saturating main application pool
🐞 Bug Summary
PR #4696 partially addressed the database connection pool multiplication issue (#4645) by optimizing SQL instrumentation from 3 sessions to 1 per query. However, the main observability lifecycle still creates 4-6 independent sessions per traced request via
_get_or_create_observability_session(), which can saturate connection pools under modest concurrency.Impact: A traced request with 10 SQL queries now uses 14-16 sessions (down from 34-36), representing a 58% reduction. However, the observability lifecycle itself (start_trace, end_trace, start_span, end_span) still accounts for 4-6 sessions per request and was not optimized in PR #4696.
🧩 Affected Component
mcpgateway- APImcpgateway- UI (admin panel)mcpgateway.wrapper- stdio wrapper🔁 Steps to Reproduce
make serve # auto-detects workersSend traced requests with moderate concurrency (3-5 concurrent requests per worker)
Monitor database connections:
psql -c "SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE '%mcpgateway%';"🤔 Expected Behavior
Each traced request should reuse a single database session for its entire observability lifecycle (start_trace → start_span → end_span → end_trace), rather than creating 4-6 independent sessions.
Target: Reduce from 4-6 sessions per traced request to 1 session per traced request.
📓 Logs / Error Output
Under load, workers may log pool saturation warnings:
Current session creation pattern in
mcpgateway/services/observability_service.py:start_trace()→ creates 1 independent sessionstart_span()→ creates 1 independent sessionend_span()→ creates 1 independent sessionend_trace()→ creates 1 independent sessionadd_event(),record_metric()→ 1-2 more sessions🧠 Environment Info
v1.0.2Python 3.11+, GunicornmacOSnone🧩 Additional Context
Related Issues:
Root Cause:
The observability service uses
_get_or_create_observability_session()which creates independent sessions for each operation. While PR #4696 added session reuse infrastructure via theobs_dbparameter, the main observability lifecycle methods (start_trace,end_trace,start_span,end_span) were not updated to use this pattern.Proposed Solutions:
Implement session reuse for observability lifecycle (High Priority):
start_trace(),end_trace(),start_span(),end_span()to accept and reuse anobs_dbsession parameterAdd batching for observability operations (High Priority):
Consider dedicated observability pool (Nice to Have):
Add performance validation (High Priority):
Affected Files:
mcpgateway/services/observability_service.py:192-218- Independent session creationmcpgateway/instrumentation/sqlalchemy.py- Already optimized in PR fix(db): resolve database connection pool multiplication #4696 (reference implementation)tests/unit/mcpgateway/services/test_observability_service.py- Needs new tests for session reuse