-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Thanos, Prometheus and Golang version used:
thanos, version 0.35.0 (branch: HEAD, revision: d7f45f7c10abde7b466c4e10b1c0ae03a8e775e6)
build user: root@b03d7f8ef1d2
build date: 20240507-08:03:05
go version: go1.21.13 (Red Hat 1.21.13-9.el9_4) X:strictfipsruntime
platform: linux/amd64
tags: netgo,strictfipsruntime
prometheus, version 2.52.0 (branch: HEAD, revision: 1e4704175adabea94a7dbf25d9ba16c1c79a592c)
build user: root@95d268d5aaf7
build date: 20250528-09:34:01
go version: go1.21.13 (Red Hat 1.21.13-9.el9_4) X:strictfipsruntime
platform: linux/amd64
tags: netgo,builtinassets,stringlabels,strictfipsruntime
Object Storage Provider: Azure (Blob storage)
What happened:
We have observed some holes in our data metrics, which we suspect might be caused by issues with the sidecar component. Specifically, we have noticed that the sidecar sometimes uploads multiple blocks simultaneously within a 2-hour window. Immediately after this, we observe that some blocks are deleted.
(Highlighting missing data in red)
What you expected to happen:
We expected the sidecar to upload blocks sequentially without deleting them shortly after upload. The data metrics should be consistent without any gaps.
How to reproduce it (as minimally and precisely as possible):
We are using the HA Prometheus (operator) that receives metrics from different Prometheus and OpenTelemetry (Otel) collectors using remote write. The Thanos sidecar is configured to upload blocks from Prometheus. We also use Thanos compact for compacting the blocks.
Below are the flags used for each component:
Prometheus:
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
- '--web.enable-lifecycle'
- '--web.enable-remote-write-receiver'
- '--web.route-prefix=/'
- '--storage.tsdb.retention.time=6h'
- '--storage.tsdb.path=/prometheus'
- '--web.enable-admin-api'
- '--no-storage.tsdb.wal-compression'
- '--web.config.file=/etc/prometheus/web_config/web-config.yaml'
- '--storage.tsdb.max-block-duration=2h'
- '--storage.tsdb.min-block-duration=2h'
Sidecar
- sidecar
- '--log.level=info'
- '--prometheus.url=http://localhost:9090/'
- '--tsdb.path=/prometheus'
- '--grpc-address=0.0.0.0:10901'
- '--http-address=0.0.0.0:10902'
- '--objstore.config-file=/etc/prometheus/azure-storage-secret.yaml'
Compact
- compact
- '--data-dir=/data'
- '--log.level=info'
- '--log.format=logfmt'
- '--objstore.config-file=/etc/prometheus/azure-storage-secret.yaml'
- '--retention.resolution-raw=180d'
- '--retention.resolution-5m=180d'
- '--retention.resolution-1h=180d'
- '--wait'
- '--wait-interval=5m'
- '--debug.accept-malformed-index'
- '--no-debug.halt-on-error'
- '--compact.enable-vertical-compaction'
- '--deduplication.replica-label=prometheus_replica'
- '--compact.concurrency=2'
- '--downsample.concurrency=1'
- '--delete-delay=48h'
Full logs to relevant components:
Sidecar
ts=2025-10-07T09:00:14.192786091Z caller=shipper.go:372 level=info msg="upload new block" id=01K6YYC8YY15G9SW9MKTTXMXRZ
ts=2025-10-07T09:00:17.547277351Z caller=shipper.go:372 level=info msg="upload new block" id=01K6YYCHDPEV3EYZFPXA0SN60F
Compact
ts=2025-10-09T09:32:07.747910564Z caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
ts=2025-10-09T09:32:07.906702511Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYCHZ40X8H173A3QC2SEJP
ts=2025-10-09T09:32:08.006361507Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YQH1GMSZJZQF4BBF57K2AZ
ts=2025-10-09T09:32:08.09941393Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYC8YY15G9SW9MKTTXMXRZ
ts=2025-10-09T09:32:08.171086818Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYCHDPEV3EYZFPXA0SN60F
ts=2025-10-09T09:32:08.171125719Z caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done"
Anything else we need to know:
Environment:
- openshift4