Skip to content

Commit c220554

Browse files
authored
TSDB head too far in future improvements (#11961)
#### What this PR does #### Which issue(s) this PR fixes or relates to Related to: #11928 #### Checklist - [x] Tests updated. - [ ] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features.
1 parent e5a1645 commit c220554

File tree

5 files changed

+33
-6
lines changed

5 files changed

+33
-6
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
* [FEATURE] You can configure Mimir to export traces in OTLP exposition format through the standard `OTEL_` environment variables. #11618
3939
* [FEATURE] distributor: Allow configuring tenant-specific HA tracker failover timeouts. #11774
4040
* [FEATURE] OTLP: Add experimental support for promoting OTel scope metadata (name, version, schema URL, attributes) to metric labels, prefixed with `otel_scope_`. Enable via the `-distributor.otel-promote-scope-metadata` flag. #11795
41+
* [ENHANCEMENT] Ingester: Display user grace interval in the tenant list obtained through the `/ingester/tenants` endpoint. #11961
4142
* [ENHANCEMENT] Dashboards: Add "Influx write requests" row to Writes Dashboard. #11731
4243
* [ENHANCEMENT] Mixin: Add `MimirHighVolumeLevel1BlocksQueried` alert that fires when level 1 blocks are queried for more than 6 hours, indicating potential compactor performance issues. #11803
4344
* [ENHANCEMENT] Querier: Make the maximum series limit for cardinality API requests configurable on a per-tenant basis with the `cardinality_analysis_max_results` option. #11456
@@ -191,6 +192,7 @@
191192

192193
### Documentation
193194

195+
* [ENHANCEMENT] Update the `MimirIngestedDataTooFarInTheFuture` runbook with a note about false positives and the endpoint to flush TSDB blocks by user. #11961
194196
* [ENHANCEMENT] Update Thanos to Mimir migration guide with a tip to add the `__tenant_id__` label. #11584
195197

196198
### Tools

docs/sources/mimir/manage/mimir-runbooks/_index.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1416,12 +1416,18 @@ How it **works**:
14161416
14171417
- The metric exported by the ingester computes the maximum timestamp from all TSDBs open in the ingester.
14181418
- The alert checks the metric and fires if the maximum timestamp is more than 1h in the future.
1419+
- This alert doesn't respect the per-user creation grace period.
14191420
14201421
How to **investigate**
14211422
1422-
- Find the tenant with a bad sample on an affected ingester's tenants list (obtained via the `/ingester/tenants` endpoint), where a warning "TSDB Head max timestamp too far in the future" is displayed.
1423-
- Flush the tenant's data to blocks storage.
1424-
- Remove the tenant's directory on disk and the restart ingester.
1423+
- Find an affected ingester pod and connect to its http API. For example, using `kubectl port-forward <pod> 8080:80`.
1424+
- Find the tenant with a bad sample on the ingester's tenants list, which you can get through the `/ingester/tenants` endpoint. The sample should display the warning "TSDB Head max timestamp too far in the future".
1425+
- If there are no warnings, the sample is likely within the tenant's grace interval and the alert is a false positive.
1426+
1427+
If it's not a false positive:
1428+
1429+
- Flush the tenant's data to blocks storage through the `/ingester/flush?wait=true&tenant=foo` endpoint.
1430+
- Remove the tenant's directory on disk and restart the ingester.
14251431
14261432
### MimirStoreGatewayTooManyFailedOperations
14271433

pkg/ingester/tenants.gohtml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
<th>Blocks</th>
1616
<th>Head MinT</th>
1717
<th>Head MaxT</th>
18+
<th>Grace interval</th>
1819
<th>Warning</th>
1920
</tr>
2021
</thead>
@@ -25,6 +26,7 @@
2526
<td>{{.Blocks}}</td>
2627
<td>{{.MinTime}}</td>
2728
<td>{{.MaxTime}}</td>
29+
<td>(-{{.PastGracePeriod}}, +{{.FutureGracePeriod}})</td>
2830
<td>{{.Warning}}</td>
2931
</tr>
3032
{{ end }}

pkg/ingester/tenants_http.go

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import (
99
"math"
1010
"net/http"
1111
"slices"
12+
"strings"
1213
"time"
1314

1415
"github.com/gorilla/mux"
@@ -28,6 +29,12 @@ type tenantStats struct {
2829
MinTime string
2930
MaxTime string
3031

32+
// PastGracePeriod defines how far in the past we accept samples.
33+
// This value includes OutOfOrderTimeWindow.
34+
PastGracePeriod time.Duration
35+
// FutureGracePeriod defines how far into the future we accept samples.
36+
FutureGracePeriod time.Duration
37+
3138
Warning string
3239
}
3340

@@ -79,21 +86,29 @@ func (i *Ingester) TenantsHandler(w http.ResponseWriter, req *http.Request) {
7986
continue
8087
}
8188

89+
var warnings []string
8290
s := tenantStats{}
8391
s.Tenant = t
8492
s.Blocks = len(db.Blocks())
93+
s.PastGracePeriod = i.limits.PastGracePeriod(t) + i.limits.OutOfOrderTimeWindow(t)
94+
s.FutureGracePeriod = i.limits.CreationGracePeriod(t)
95+
8596
minMillis := db.Head().MinTime()
8697
s.MinTime = formatMillisTime(db.Head().MinTime())
8798
maxMillis := db.Head().MaxTime()
8899
s.MaxTime = formatMillisTime(maxMillis)
89100

90-
if maxMillis-nowMillis > i.limits.CreationGracePeriod(t).Milliseconds() {
91-
s.Warning = "TSDB Head max timestamp too far in the future"
101+
if delta := maxMillis - nowMillis; delta > s.FutureGracePeriod.Milliseconds() {
102+
deltaDuration := time.Duration(delta) * time.Millisecond
103+
warning := fmt.Sprintf("TSDB Head max timestamp too far in the future: %v", deltaDuration)
104+
warnings = append(warnings, warning)
92105
}
93106
if i.limits.PastGracePeriod(t) > 0 && nowMillis-minMillis > (i.limits.PastGracePeriod(t)+i.limits.OutOfOrderTimeWindow(t)).Milliseconds() {
94-
s.Warning = "TSDB Head min timestamp too far in the past"
107+
warnings = append(warnings, "TSDB Head min timestamp too far in the past")
95108
}
96109

110+
s.Warning = strings.Join(warnings, ", ")
111+
97112
tss = append(tss, s)
98113
}
99114

pkg/ingester/tenants_http_test.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ func TestIngester_TenantsHandlers(t *testing.T) {
4444
require.Equal(t, http.StatusOK, rec.Code)
4545
// Check if link to user's TSDB was generated
4646
require.Contains(t, rec.Body.String(), fmt.Sprintf(`<a href="tsdb/%s">%s</a>`, userID, userID))
47+
// Check if grace interval is present
48+
require.Contains(t, rec.Body.String(), `<td>(-0s, +10m0s)</td>`)
4749
})
4850

4951
t.Run("tenant TSDB for valid tenant", func(t *testing.T) {

0 commit comments

Comments
 (0)