feat(ingestion): Added response time telemetry utility and add telemetry for Looker client #14970
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[WIP] UTs for
response_time_telemetry.py
pending design reviewSummary
This PR introduces a comprehensive response time telemetry system for DataHub's ingestion framework, enabling efficient tracking and percentile calculation of API response times using the t-digest algorithm. The implementation is memory-efficient, performant, and provides detailed insights into API performance patterns.
Initial integration is demonstrated with the Looker source connector, tracking response times across all Looker API operations.
Sample Output
🎯 Motivation
Problem Statement
Business Value
🏗️ Architecture
Core Components
1. Response Time Telemetry Utility (
response_time_telemetry.py
)New file:
src/datahub/telemetry/response_time_telemetry.py
(372 lines)Key Classes:
ResponseTimeTelemetry
: Tracks metrics for a single API type using t-digestResponseTimeMetrics
: Aggregates metrics across multiple API typesResponseTimeTracker
: Context manager for automatic time trackingFeatures:
2. Telemetry Configuration (
telemetry_config.py
)Modified file:
src/datahub/configuration/telemetry_config.py
New Configuration Options:
Allows users to:
3. Looker Integration (
looker_lib_wrapper.py
)Modified file:
src/datahub/ingestion/source/looker/looker_lib_wrapper.py
Changes:
📊 Performance Analysis
T-Digest vs Naive Approach
We evaluated the t-digest approach against a naive implementation that stores all data points:
Test Results: 50,000 API Response Times
Test Results: 100,000 API Response Times
Key Insights:
New Dependencies
>=2.0.0
): T-digest algorithm implementationCompatibility
📚 References