-
Notifications
You must be signed in to change notification settings - Fork 4.6k
rls: update rls cache metrics to use async gauge framework #8808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
rls: update rls cache metrics to use async gauge framework #8808
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8808 +/- ##
==========================================
+ Coverage 83.26% 83.40% +0.14%
==========================================
Files 418 418
Lines 33004 33014 +10
==========================================
+ Hits 27480 27535 +55
+ Misses 4109 4075 -34
+ Partials 1415 1404 -11
🚀 New features to boost your workflow:
|
arjan-bal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can clarify the meaning of asynchronous in the release notes by adding something like: "The metric is recorded once per collection cycle, rather than every time its value changes."
| lb.logger = internalgrpclog.NewPrefixLogger(logger, fmt.Sprintf("[rls-experimental-lb %p] ", lb)) | ||
| lb.dataCache = newDataCache(maxCacheSize, lb.logger, cc.MetricsRecorder(), opts.Target.String()) | ||
| lb.dataCache = newDataCache(maxCacheSize, lb.logger, opts.Target.String()) | ||
| if metricsRecorder := cc.MetricsRecorder(); metricsRecorder != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the nil check here. cc.MetricsRecorder() must always return a non-nil metrics recorder. The ClientConn implementation provided by gRPC ensures this, and we enforce that LB policies embed a valid ClientConn when wrapping. A nil MetricsRecorder would indicate a bug in a custom LB policy that is explicitly returning nil.
| logger *internalgrpclog.PrefixLogger | ||
|
|
||
| // metricHandler is the function to deregister the async metric reporter. | ||
| metricHandler func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we include unregister in the name to emphasize that it's a cleanup function? metricHandler seems a little generic.
| if b.ctrlCh != nil { | ||
| b.ctrlCh.close() | ||
| } | ||
| if b.metricHandler != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should omit the nil check. The constructor guarantees metricHandler is non-nil. If it is nil, the balancer is in an invalid state, and we should let it crash to surface the bug.
| b.cacheMu.Lock() | ||
| defer b.cacheMu.Unlock() | ||
|
|
||
| if b.dataCache == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should omit the nil check.
| func (r *testAsyncMetricsRecorder) RecordInt64AsyncGauge(h *estats.Int64AsyncGaugeHandle, v int64, _ ...string) { | ||
| if r.data == nil { | ||
| r.data = make(map[string]int64) | ||
| } | ||
| r.data[h.Descriptor().Name] = v | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this functionality be added to the existing NewTestMetricsRecorder so it can be reused by other tests in the future?
| if r.data == nil { | ||
| r.data = make(map[string]int64) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure the object is initialized with a non-nil map during creation (via a constructor or otherwise). This makes it easier to reason about the state and avoids potential nil pointer dereferences.
| } | ||
| if got, _ := tmr.Metric(cacheSizeKey); got != 15 { | ||
| t.Fatalf("Unexpected data for metric %v, got: %v, want: %v", cacheSizeKey, got, 15) | ||
| verifyMetrics := func(wantEntries, wantSize int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This closure seems like a assertions helper which is discouraged by the styleguide: https://google.github.io/styleguide/go/best-practices#leave-testing-to-the-test-function
It does have useful error messages though. I would still recommend inlining the logic since it's just a few lines anyways.
| b.cacheMu.Lock() | ||
| defer b.cacheMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we should avoid reporting metrics while holding the mutex. If the metrics collector is slow or hangs (due to slow I/O, for instance), it could potentially block the RLS from functioning. We can fetch the required data while holding the lock, but we should release it before calling the metrics reporter.
This PR changes the way rls cache metrics ("grpc.lb.rls.cache_entries" & "grpc.lb.rls.cache_size") to use async gauge framework instead of synchronoush capturing.
This also does cleanup to remove the NoopMetricsRecorder struct along with addressing some old nits.
RELEASE NOTES: