fix(observability): reduce session proliferation from 4-6 to 1 per request#5073
Open
bogdanmariusc10 wants to merge 2 commits into
Conversation
…quest Resolves #5072 - Modified start_trace() and end_trace() to accept obs_db and commit parameters - Updated ObservabilityMiddleware to create single session for entire trace lifecycle - All operations (start_trace, start_span, end_span, end_trace) now reuse same session - Final end_trace() performs atomic commit of all batched operations - Added 8 comprehensive unit tests for session reuse behavior - Maintains backward compatibility when obs_db parameter not provided Impact: - Reduces database sessions from 4-6 to 1 per traced request (75-83% reduction) - Prevents connection pool saturation under moderate concurrency - Atomic batching improves transaction consistency Test coverage: - 8 new session reuse tests in test_observability_session_reuse.py - All 102 observability tests pass - Validates full lifecycle, backward compatibility, and error handling Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
7 tasks
Signed-off-by: Bogdan-Marius-Catanus <bogdan-marius.catanus@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔗 Related Issue
Closes #5072
📝 Summary
This PR fixes the observability session proliferation bug where each traced request was creating 4-6 independent database sessions, causing connection pool saturation under moderate load.
Problem: After PR #4696 optimized SQL instrumentation, the main observability lifecycle (start_trace → start_span → end_span → end_trace) still created 4-6 sessions per request by calling
_get_or_create_observability_session()independently for each operation.Solution: Implemented session reuse pattern where
ObservabilityMiddlewarecreates a single database session at trace start and passes it through the entire lifecycle. All operations usecommit=Falsefor batching, with a final atomiccommit=Trueinend_trace().Impact:
🏷️ Type of Change
🧪 Verification
make lintmake testmake coverageTest Results:
✅ Checklist
make black isort pre-commit)📓 Notes
Changes Made
1. Service Layer (
mcpgateway/services/observability_service.py):start_trace()(lines 271-362): Addedcommitandobs_dbparametersend_trace()(lines 364-432): Addedcommitandobs_dbparametersstart_span()andend_span()already had session reuse from PR fix(db): resolve database connection pool multiplication #46962. Middleware Layer (
mcpgateway/middleware/observability_middleware.py):dispatch()method (lines 107-309)SessionLocal()session at line 161start_trace(commit=False, obs_db=obs_db)- lines 164-175start_span(commit=False, obs_db=obs_db)- lines 191-196end_span(commit=False, obs_db=obs_db)- lines 227-234end_trace(commit=True, obs_db=obs_db)- lines 240-249 (atomic commit)3. Test Coverage
Session Reuse Tests (
tests/unit/mcpgateway/services/test_observability_session_reuse.py):test_start_trace_with_session_reuse- Verifies start_trace accepts obs_db parameter and doesn't create new sessiontest_start_trace_without_session_creates_own- Verifies backward compatibility when obs_db=Nonetest_end_trace_with_session_reuse- Verifies end_trace accepts obs_db parameter and doesn't create new sessiontest_end_trace_without_session_creates_own- Verifies backward compatibility when obs_db=Nonetest_start_span_with_session_reuse- Verifies start_span session reuse (from PR fix(db): resolve database connection pool multiplication #4696)test_end_span_with_session_reuse- Verifies end_span session reuse (from PR fix(db): resolve database connection pool multiplication #4696)test_full_trace_lifecycle_with_single_session- Critical test: Verifies complete lifecycle (start_trace → start_span → end_span → end_trace) uses only 1 session with single atomic committest_add_event_with_session_reuse- Verifies event handling with session reuseMiddleware Tests (
tests/unit/mcpgateway/middleware/test_observability_middleware.py):test_dispatch_disabled- Verifies middleware skips when disabledtest_dispatch_health_check_skipped- Verifies health check paths are skippedtest_dispatch_trace_setup_success- Verifies successful trace lifecycletest_dispatch_trace_setup_failure- Verifies graceful handling of trace setup failurestest_dispatch_exception_during_request- Verifies trace completion even when request failstest_dispatch_trace_setup_cleanup_close_failure_logs_debug- Verifies debug logging for cleanup failurestest_dispatch_end_span_failure_logs_warning- Verifies warning logging for end_span failurestest_dispatch_session_close_failure_logs_debug- Verifies debug logging for session close failures in finally blocktest_dispatch_end_trace_failure_logs_warning- Verifies warning logging for end_trace failurestest_dispatch_trace_setup_failure_with_session_close_failure- New: Verifies graceful handling when both trace setup AND session close fail (covers lines 213-214)test_dispatch_with_user_context- Verifies user context extractiontest_dispatch_with_traceparent_header- Verifies W3C trace context propagationtest_dispatch_response_headers- Verifies trace ID in response headerstest_dispatch_status_code_handling- Verifies status code trackingTest Results:
Coverage: The test suite now provides comprehensive coverage including:
obs_db=NoneBackward Compatibility
The fix maintains full backward compatibility:
obs_db=None(default), methods create their own independent sessions (original behavior)obs_dbis provided, methods reuse the provided session (new optimized behavior)Design Decision: Atomic Batching
All observability operations within a trace lifecycle are batched into a single transaction with a final atomic commit. This provides:
Connection Pool Sizing Guidance
With this fix, the observability lifecycle uses 1 session per traced request (down from 4-6). Default pool configuration (
DB_POOL_SIZE=200,DB_MAX_OVERFLOW=10) now supports ~200 concurrent traced requests instead of ~35-50, a 4-5x improvement in capacity.