-
Notifications
You must be signed in to change notification settings - Fork 693
Deduplicate identical series labels and track their memory consumption in QueryLimiter.AddSeries #13806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pkg/distributor/distributor.go
Outdated
| result := make([]labels.Labels, 0, len(metrics)) | ||
| for _, m := range metrics { | ||
| if err := queryLimiter.AddSeries(m); err != nil { | ||
| uniqueSeriesLabels, err := queryLimiter.AddSeries(m, mimir_limiter.NoopMemoryTracker{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called from /series endpoint in which we are not tracking memory consumption, hence we use NoopMemoryTracker here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use the memory consumption tracker from the context. If the /series endpoint doesn't provide one, then we'll need to add it in for that endpoint, just like we did for the other endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in f75c488
pkg/util/limiter/query_limiter.go
Outdated
| type MemoryTracker interface { | ||
| IncreaseMemoryConsumptionForLabels(labels labels.Labels) error | ||
| } | ||
|
|
||
| type NoopMemoryTracker struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, we can create a MemoryConsumptionTracker with limit set to 0 to disable enforcing a memory consumption limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 3107044
pkg/distributor/distributor.go
Outdated
| result := make([]labels.Labels, 0, len(metrics)) | ||
| for _, m := range metrics { | ||
| if err := queryLimiter.AddSeries(m); err != nil { | ||
| uniqueSeriesLabels, err := queryLimiter.AddSeries(m, mimir_limiter.NoopMemoryTracker{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use the memory consumption tracker from the context. If the /series endpoint doesn't provide one, then we'll need to add it in for that endpoint, just like we did for the other endpoints.
| uniqueSeriesBefore := len(ql.uniqueSeries) | ||
| ql.uniqueSeries[fingerprint] = struct{}{} | ||
| uniqueSeriesAfter := len(ql.uniqueSeries) | ||
| uniqueSeriesBefore := len(ql.uniqueSeries) + countConflictSeries(ql.conflictSeries) | ||
| var found bool | ||
| var existing labels.Labels | ||
| var newSeriesButHashCollided bool | ||
| if existing, found = ql.uniqueSeries[fingerprint]; !found || labels.Equal(existing, seriesLabels) { | ||
| if !found { | ||
| // This is unique new series. | ||
| ql.uniqueSeries[fingerprint] = seriesLabels | ||
| err := tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) | ||
| if err != nil { | ||
| return labels.EmptyLabels(), err | ||
| } | ||
| } | ||
| } else { | ||
| // Conflicted hash is found. | ||
| if ql.conflictSeries == nil { | ||
| ql.conflictSeries = make(map[uint64][]labels.Labels) | ||
| } | ||
| l := ql.conflictSeries[fingerprint] | ||
| for _, prev := range l { | ||
| // Labels matches with previous series, return the same labels instance. | ||
| if labels.Equal(prev, seriesLabels) { | ||
| return prev, nil | ||
| } | ||
| } | ||
| newSeriesButHashCollided = true | ||
| ql.conflictSeries[fingerprint] = append(l, seriesLabels) | ||
| err := tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) | ||
| if err != nil { | ||
| return labels.EmptyLabels(), err | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this a bit given hash conflicts are rare and we were previously OK with undercounting in the event of a hash conflict:
- if the hash has never been seen before: add it to
uniqueSeriesand return the labels passed in - if the hash has been seen before:
- check if the corresponding labels in
uniqueSeriesare the same, and if so, return the labels fromuniqueSeries - if they're not the same (ie. this is a conflict where two different sets of labels return the same hash), return the labels passed in
- check if the corresponding labels in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied as suggested 370860f
| // Manually set up the collision state in the limiter | ||
| limiter.conflictSeries = make(map[uint64][]labels.Labels) | ||
| // The first series is in uniqueSeries map. | ||
| limiter.uniqueSeries[collisionHash] = series1 | ||
| // If another series hash is colliding, AddSeries will add it to conflictSeries map. | ||
| limiter.conflictSeries[collisionHash] = []labels.Labels{series2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? Couldn't we call limiter.AddSeries(series1), limiter.AddSeries(series2) etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we won't test hash collision won't we? Nevertheless I refactored the hash check implementation to simplify it as suggested here.
| // Try stringlabels implementation (default) | ||
| // stringlabels stores data in a "data" field of type string | ||
| if aData := aVal.FieldByName("data"); aData.IsValid() && aData.Kind() == reflect.String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this runtime detection of which implementation is in use, what if we used the slicelabels build tag to include a different implementation based on which build tag is enabled?
This is how we handle the different implementations in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this into different build tagged files in 885b3d1. However I still use reflection to get the internal field so that we can compare the sameness of internal reference.
d0b33ee to
ac6c728
Compare
| if ql.maxSeriesPerQuery != 0 && len(ql.uniqueSeries) > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory consumption tracked before duplicate check, causing over-counting
The IncreaseMemoryConsumptionForLabels call at line 87 occurs unconditionally BEFORE the duplicate label check at lines 94-102. The comment on lines 95-96 states this branch is "not counting up the memory consumption" for duplicates, but memory has already been counted. This causes duplicate series labels to be counted multiple times in the memory tracker, potentially causing queries to be rejected with memory limit errors even when actual unique memory usage is within limits. The memory consumption increase belongs after the duplicate check, only for newly-seen series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. If we have to track the unique labels only, there is no easy way to reduce memory consumption as we iterate the series when processing the series only for that unique labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment near the IncreaseMemoryConsumptionForLabels call above explaining why we always call it, even if the series is a duplicate, and why this is important for handling hash collisions?
Alternatively, I think we can solve this by moving the wrapping of the SeriesSet in MemoryTrackingSeriesSet so that a single MemoryTrackingSeriesSet wraps the merged result from all ingesters and store-gateways, rather than separately wrapping the SeriesSet for all ingesters with a MemoryTrackingSeriesSet and then wrapping the SeriesSet for all store-gateways in a separate MemoryTrackingSeriesSet. If we do this, then we'll need to think about how we handle hash conflicts.
pkg/util/limiter/query_limiter.go
Outdated
| // labels and not counting up the memory consumption. | ||
| if found && labels.Equal(existing, seriesLabels) { | ||
| // Still return error if the duplicated labels had been exceeding the limit. | ||
| if ql.maxSeriesPerQuery != 0 && len(ql.uniqueSeries) > ql.maxSeriesPerQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] You could re-use the uniqueSeriesBefore rather then call len(ql.uniqueSeries) again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 00b9117
| aPtr := aData.Pointer() | ||
| bPtr := bData.Pointer() | ||
| assert.Equal(t, aPtr, bPtr, "labels should share the same data slice (dedupelabels)") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a failure assertion to the end of each of the assertSameLabels functions, so that if another user calls these expecting a full label comparison the function will fail if the given labels have different lengths/values etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced assert with require.
| if aData.Len() > 0 && bData.Len() > 0 && aData.Len() == bData.Len() { | ||
| aPtr := aData.Pointer() | ||
| bPtr := bData.Pointer() | ||
| assert.Equal(t, aPtr, bPtr, "labels should share the same data slice (dedupelabels)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a require.Equal ?
| if aSlice.Len() > 0 && bSlice.Len() > 0 && aSlice.Leng() == bSlice.Len() { | ||
| aPtr := aSlice.Pointer() | ||
| bPtr := bSlice.Pointer() | ||
| assert.Equal(t, aPtr, bPtr, "labels should share the same slice backing array (slicelabels)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think for this require is the better use. Updated in 4ebb43e
| if len(aStr) > 0 && len(bStr) > 0 && len(aStr) == len(bStr) { | ||
| aPtr := unsafe.Pointer(unsafe.StringData(aStr)) | ||
| bPtr := unsafe.Pointer(unsafe.StringData(bStr)) | ||
| assert.Equal(t, aPtr, bPtr, "labels should share the same internal data pointer (stringlabels)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think for this require is the better use. Updated in 4ebb43e
| returnedSeries1Dup, err := limiter.AddSeries(series1, memoryTracker) | ||
| assert.NoError(t, err) | ||
| assertSameLabels(t, returnedSeries1Dup, series1) | ||
| assert.Equal(t, 2, limiter.uniqueSeriesCount()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be require. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think for this require is the better use. Updated in 4ebb43e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lamida - A few comments and questions added ... also note the CHANGELOG.md conflict.
| if aData.Len() == 0 && bData.Len() == 0 { | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, then we can drop the if condition on line 24 below, and it also covers Andrew's suggestion:
| if aData.Len() == 0 && bData.Len() == 0 { | |
| return | |
| } | |
| require.Equal(t, aData.Len(), bData.Len()) |
(similar feedback applies for other implementations for slicelabels and stringlabels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 547b4e9
pkg/util/limiter/query_limiter.go
Outdated
| "github.com/grafana/mimir/pkg/util/validation" | ||
| ) | ||
|
|
||
| type LabelsMemoryTrackerIncreaser interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this interface still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 074265a. I also removed other memoryTracker interfaces which I find it unnecessary.
| "github.com/prometheus/prometheus/model/labels" | ||
| ) | ||
|
|
||
| func assertSameLabels(t *testing.T, a, b labels.Labels) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the value of the syms field as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed compat_dedudupelabels_test.go and compat_slidelabels_test.go as suggested in #13806 (comment)
| aSlice := aVal.FieldByName("labels") | ||
| bSlice := bVal.FieldByName("labels") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? labels.Labels is a slice for slicelabels.
Given we only ever use stringlabels for Mimir now, we could also just drop this file and compat_dedupelabels_test.go - if we ever build Mimir with the other implementations, we can add these methods then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just remove those files for now 4956d88.
| return series.LabelsToSeriesSet(ms) | ||
| return series.NewMemoryTrackingSeriesSet(series.LabelsToSeriesSet(ms), memoryTracker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change needed? Don't /series requests always use an unlimited memory consumption tracker?
| if ql.maxSeriesPerQuery != 0 && len(ql.uniqueSeries) > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment near the IncreaseMemoryConsumptionForLabels call above explaining why we always call it, even if the series is a duplicate, and why this is important for handling hash collisions?
Alternatively, I think we can solve this by moving the wrapping of the SeriesSet in MemoryTrackingSeriesSet so that a single MemoryTrackingSeriesSet wraps the merged result from all ingesters and store-gateways, rather than separately wrapping the SeriesSet for all ingesters with a MemoryTrackingSeriesSet and then wrapping the SeriesSet for all store-gateways in a separate MemoryTrackingSeriesSet. If we do this, then we'll need to think about how we handle hash conflicts.
| if ql.maxSeriesPerQuery != 0 && len(ql.uniqueSeries) > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory consumption tracked before duplicate check causes over-counting
High Severity
In AddSeries, tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) is called at line 87 before checking if the series is a duplicate at lines 94-102. This means memory consumption is increased for every call, even when the series is a duplicate and the existing labels are returned. The comment at line 95-96 states "not counting up the memory consumption" for duplicates, but that's incorrect since memory was already tracked before the check. This causes memory to be over-counted for duplicate series, potentially causing queries to fail prematurely with memory limit errors when they shouldn't.
| if ql.maxSeriesPerQuery != 0 && len(ql.uniqueSeries) > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory tracked for duplicates despite comment saying otherwise
High Severity
The tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) call at line 87 is executed before the duplicate check at lines 94-102. The comment at lines 95-96 states duplicates should not count toward memory consumption, but the memory has already been tracked by that point. This causes memory to be counted for every call including duplicates, potentially triggering premature memory limit errors when the goal was to deduplicate and save memory. The memory tracking call needs to happen only for non-duplicate series.
| aStr := aVal.FieldByName("data").String() | ||
| bStr := bVal.FieldByName("data").String() | ||
|
|
||
| require.Equal(t, len(aStr), len(aStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test compares string length to itself instead of other
Low Severity
The assertion require.Equal(t, len(aStr), len(aStr)) compares aStr length to itself rather than comparing len(aStr) to len(bStr). This appears to be a copy-paste error that causes the test to always pass regardless of whether the two label strings have the same length, making this assertion ineffective at catching length mismatches.
| if ql.maxSeriesPerQuery != 0 && uniqueSeriesBefore > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory consumption increased before duplicate check, causing over-counting
High Severity
In AddSeries, tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) is called at line 83 before checking if the labels are duplicates at lines 90-93. For duplicates, the function returns existing labels (not seriesLabels), but memory was already increased for seriesLabels. The comment at lines 91-92 explicitly states duplicates should NOT count memory consumption, but the code contradicts this. When MemoryTrackingSeriesSet.At() later decreases memory for the returned existing labels, the increase for the unused seriesLabels is never offset. This causes memory over-counting proportional to duplicate series count, potentially causing queries to be rejected prematurely for exceeding memory limits.
| aStr := aVal.FieldByName("data").String() | ||
| bStr := bVal.FieldByName("data").String() | ||
|
|
||
| require.Equal(t, len(aStr), len(aStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test assertion compares variable to itself
Medium Severity
The requireSameLabels test helper function has a typo at line 25: require.Equal(t, len(aStr), len(aStr)) compares len(aStr) to itself instead of comparing len(aStr) to len(bStr). This assertion always passes, which means the test will never catch cases where two labels have different lengths. Combined with the conditional check on line 27, if labels have different lengths, the function silently returns without any assertion failure, allowing incorrect deduplication behavior to go undetected.
# Conflicts: # CHANGELOG.md
It will be difficult in the call to decrease to find which one is duplicated labels.
…bels memory consumption is decreased as the series iterated
…ll memory tracking cases
00b9117 to
3534363
Compare
| err := tracker.IncreaseMemoryConsumptionForLabels(seriesLabels) | ||
| if err != nil { | ||
| return labels.EmptyLabels(), err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory consumption counted for duplicate series despite comment
Medium Severity
The IncreaseMemoryConsumptionForLabels call on line 83 runs unconditionally before checking if the series is a duplicate. However, the comment on lines 91-92 explicitly states the intent is "not counting up the memory consumption" for duplicates. Since duplicate series reuse existing labels rather than allocating new ones, memory consumption is over-counted when duplicates are encountered. The IncreaseMemoryConsumptionForLabels call needs to be moved after the duplicate check to only count memory for new unique series.
Additional Locations (1)
| aStr := aVal.FieldByName("data").String() | ||
| bStr := bVal.FieldByName("data").String() | ||
|
|
||
| require.Equal(t, len(aStr), len(aStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test assertion compares variable to itself, always passes
Low Severity
The assertion require.Equal(t, len(aStr), len(aStr)) compares len(aStr) to itself rather than comparing len(aStr) to len(bStr). This check always passes regardless of whether the two labels have the same length, making it a meaningless assertion. The requireSameLabels helper function is meant to verify two labels share the same internal data, but this broken assertion won't catch length mismatches between a and b.
| if ql.maxSeriesPerQuery != 0 && uniqueSeriesBefore > ql.maxSeriesPerQuery { | ||
| return labels.EmptyLabels(), NewMaxSeriesHitLimitError(uint64(ql.maxSeriesPerQuery)) | ||
| } | ||
| return existing, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory tracked for duplicate labels before duplicate check
High Severity
The IncreaseMemoryConsumptionForLabels call on line 83 happens unconditionally before checking if the labels are duplicates (lines 90-98). When a duplicate is found, the function returns the existing labels but memory was already tracked for the input labels. The comment on line 91-92 states "not counting up the memory consumption" for duplicates, but the code has already counted it. This causes memory over-counting for duplicate series, defeating the PR's optimization goal of properly tracking memory consumption for deduplicated labels. The IncreaseMemoryConsumptionForLabels call needs to be moved after the duplicate check, only for newly-seen labels.
https://github.com/grafana/mimir-squad/issues/3280
Reuse the labels when we new the series use the same labels seen before.
Benchmark
Details
The benchmark shows that with more duplicated labels B/op metrics are becoming lower.
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Introduces label de-duplication and memory tracking in query limiting and propagates it through the read path.
QueryLimiter.AddSeriestoAddSeries(lbls, tracker) (labels.Labels, error), storing canonical labels per fingerprint, enforcing limits, and accounting memory viaMemoryConsumptionTrackerMemoryConsumptionTrackerthrough contexts and callers: distributor (MetricsForLabelMatchers,QueryStream), querier (distributor and store-gateway streaming, series set wrapping), and ingester/store-gateway stream readers (now take*limiter.MemoryConsumptionTracker)-compactor.upload-sparse-index-headersWritten by Cursor Bugbot for commit dcf8840. This will update automatically on new commits. Configure here.