-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Add OpenTelemetry Native Metrics Support #3637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
5b7ff5f to
168f918
Compare
0940dd5 to
b8afd7f
Compare
…quest metric type from UpAndDownCounter to Async gauge.
ndyakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitting partial review
internal/pool/pool.go
Outdated
| // Global callback for connection state changes (set by otel package) | ||
| connectionStateChangeCallback func(ctx context.Context, cn *Conn, fromState, toState string) | ||
|
|
||
| // Global callback for connection creation time (set by otel package) | ||
| connectionCreateTimeCallback func(ctx context.Context, duration time.Duration, cn *Conn) | ||
|
|
||
| // Global callback for connection relaxed timeout changes (set by otel package) | ||
| // Parameters: ctx, delta (+1/-1), cn, poolName, notificationType | ||
| connectionRelaxedTimeoutCallback func(ctx context.Context, delta int, cn *Conn, poolName, notificationType string) | ||
|
|
||
| // Global callback for connection handoff (set by otel package) | ||
| // Parameters: ctx, cn, poolName | ||
| connectionHandoffCallback func(ctx context.Context, cn *Conn, poolName string) | ||
|
|
||
| // Global callback for error tracking (set by otel package) | ||
| // Parameters: ctx, errorType, cn, statusCode, isInternal, retryAttempts | ||
| errorCallback func(ctx context.Context, errorType string, cn *Conn, statusCode string, isInternal bool, retryAttempts int) | ||
|
|
||
| // Global callback for maintenance notifications (set by otel package) | ||
| // Parameters: ctx, cn, notificationType | ||
| maintenanceNotificationCallback func(ctx context.Context, cn *Conn, notificationType string) | ||
|
|
||
| // Global callback for connection wait time (set by otel package) | ||
| // Parameters: ctx, duration, cn | ||
| connectionWaitTimeCallback func(ctx context.Context, duration time.Duration, cn *Conn) | ||
|
|
||
| // Global callback for connection use time (set by otel package) | ||
| // Parameters: ctx, duration, cn | ||
| connectionUseTimeCallback func(ctx context.Context, duration time.Duration, cn *Conn) | ||
|
|
||
| // Global callback for connection timeouts (set by otel package) | ||
| // Parameters: ctx, cn, timeoutType | ||
| connectionTimeoutCallback func(ctx context.Context, cn *Conn, timeoutType string) | ||
|
|
||
| // Global callback for connection closed (set by otel package) | ||
| // Parameters: ctx, cn, reason, err | ||
| connectionClosedCallback func(ctx context.Context, cn *Conn, reason string, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This global callbacks are not thread safe. Someone may set (or set nil - remove) the callback while a check on its existence is happening. In the best case scenario - we will end up having undefined behaviour, worst case - nil pointer panics. We should fix this with either a global callback mutex or atomic values here. I do think a RWMutex will be an easier and cleaner approach.
internal/pool/pool.go
Outdated
| if err := p.waitTurn(ctx); err != nil { | ||
| // Record timeout if applicable | ||
| if err == ErrPoolTimeout && connectionTimeoutCallback != nil { | ||
| connectionTimeoutCallback(ctx, nil, "pool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also record general error metric here?
internal/pool/pool.go
Outdated
|
|
||
| // Global callback for error tracking (set by otel package) | ||
| // Parameters: ctx, errorType, cn, statusCode, isInternal, retryAttempts | ||
| errorCallback func(ctx context.Context, errorType string, cn *Conn, statusCode string, isInternal bool, retryAttempts int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there's only two places where this metric is exposed. The metric should expose ALL errors in a client including command execution failure.
otel.go
Outdated
| // toConnInfo converts internal pool.Conn to public ConnInfo interface | ||
| func toConnInfo(cn *pool.Conn) ConnInfo { | ||
| if cn != nil { | ||
| return cn | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed.
ndyakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left two comments that can be considered for all get/set operations. other than that the thread safety looks ok if we are willing to accept that there is a time window for which getters may have returned an old value. can we make sure if we remove the telemetry, this won't break the app? In the case when the old value is nil I assume we can accept a single event not being tracked.
internal/pool/pool.go
Outdated
| func getConnectionStateChangeCallback() func(ctx context.Context, cn *Conn, fromState, toState string) { | ||
| callbackMu.RLock() | ||
| cb := connectionStateChangeCallback | ||
| callbackMu.RUnlock() | ||
| return cb | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a slight chance that connectionStateChangeCallback can be changed after the runlick but before the use. If we agree that this timeframe is something we are willing to tolerate, this is fine.
internal/pool/pool.go
Outdated
| func SetConnectionTimeoutCallback(fn func(ctx context.Context, cn *Conn, timeoutType string)) { | ||
| callbackMu.Lock() | ||
| connectionTimeoutCallback = fn | ||
| callbackMu.Unlock() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are setting multiple callbacks (as we should in most cases) wouldn't it be better to have a single lock -> set all callbacks -> unlock block for them?
This PR implements Observability (Metrics) for go-redis, adding comprehensive OpenTelemetry metrics support following the OpenTelemetry Database Client Semantic Conventions.
The implementation uses a Bridge Pattern to keep the core library dependency-free while providing optional, metrics instrumentation through the new extra/redisotel-native package.
What's Included:
Metric groups covering aspects of Redis operations: