Skip to content

Thanos sidecar / Compact : Data metric holes observed #8513

@YassineChargui

Description

@YassineChargui

Thanos, Prometheus and Golang version used:

thanos, version 0.35.0 (branch: HEAD, revision: d7f45f7c10abde7b466c4e10b1c0ae03a8e775e6)
  build user:       root@b03d7f8ef1d2
  build date:       20240507-08:03:05
  go version:       go1.21.13 (Red Hat 1.21.13-9.el9_4) X:strictfipsruntime
  platform:         linux/amd64
  tags:             netgo,strictfipsruntime
prometheus, version 2.52.0 (branch: HEAD, revision: 1e4704175adabea94a7dbf25d9ba16c1c79a592c)
  build user:       root@95d268d5aaf7
  build date:       20250528-09:34:01
  go version:       go1.21.13 (Red Hat 1.21.13-9.el9_4) X:strictfipsruntime
  platform:         linux/amd64
  tags:             netgo,builtinassets,stringlabels,strictfipsruntime

Object Storage Provider: Azure (Blob storage)

What happened:

We have observed some holes in our data metrics, which we suspect might be caused by issues with the sidecar component. Specifically, we have noticed that the sidecar sometimes uploads multiple blocks simultaneously within a 2-hour window. Immediately after this, we observe that some blocks are deleted.

(Highlighting missing data in red)
Image

What you expected to happen:

We expected the sidecar to upload blocks sequentially without deleting them shortly after upload. The data metrics should be consistent without any gaps.

How to reproduce it (as minimally and precisely as possible):

We are using the HA Prometheus (operator) that receives metrics from different Prometheus and OpenTelemetry (Otel) collectors using remote write. The Thanos sidecar is configured to upload blocks from Prometheus. We also use Thanos compact for compacting the blocks.

Below are the flags used for each component:

Prometheus:

- '--web.console.templates=/etc/prometheus/consoles'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
- '--web.enable-lifecycle'
- '--web.enable-remote-write-receiver'
- '--web.route-prefix=/'
- '--storage.tsdb.retention.time=6h'
- '--storage.tsdb.path=/prometheus'
- '--web.enable-admin-api'
- '--no-storage.tsdb.wal-compression'
- '--web.config.file=/etc/prometheus/web_config/web-config.yaml'
- '--storage.tsdb.max-block-duration=2h'
- '--storage.tsdb.min-block-duration=2h'

Sidecar

- sidecar
- '--log.level=info'
- '--prometheus.url=http://localhost:9090/'
- '--tsdb.path=/prometheus'
- '--grpc-address=0.0.0.0:10901'
- '--http-address=0.0.0.0:10902'
- '--objstore.config-file=/etc/prometheus/azure-storage-secret.yaml'

Compact

  - compact
  - '--data-dir=/data'
  - '--log.level=info'
  - '--log.format=logfmt'
  - '--objstore.config-file=/etc/prometheus/azure-storage-secret.yaml'
  - '--retention.resolution-raw=180d'
  - '--retention.resolution-5m=180d'
  - '--retention.resolution-1h=180d'
  - '--wait'
  - '--wait-interval=5m'
  - '--debug.accept-malformed-index'
  - '--no-debug.halt-on-error'
  - '--compact.enable-vertical-compaction'
  - '--deduplication.replica-label=prometheus_replica'
  - '--compact.concurrency=2'
  - '--downsample.concurrency=1'
  - '--delete-delay=48h'

Full logs to relevant components:

Sidecar

ts=2025-10-07T09:00:14.192786091Z caller=shipper.go:372 level=info msg="upload new block" id=01K6YYC8YY15G9SW9MKTTXMXRZ
ts=2025-10-07T09:00:17.547277351Z caller=shipper.go:372 level=info msg="upload new block" id=01K6YYCHDPEV3EYZFPXA0SN60F

Compact

ts=2025-10-09T09:32:07.747910564Z caller=blocks_cleaner.go:44 level=info msg="started cleaning of blocks marked for deletion"
ts=2025-10-09T09:32:07.906702511Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYCHZ40X8H173A3QC2SEJP
ts=2025-10-09T09:32:08.006361507Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YQH1GMSZJZQF4BBF57K2AZ
ts=2025-10-09T09:32:08.09941393Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYC8YY15G9SW9MKTTXMXRZ
ts=2025-10-09T09:32:08.171086818Z caller=blocks_cleaner.go:54 level=info msg="deleted block marked for deletion" block=01K6YYCHDPEV3EYZFPXA0SN60F
ts=2025-10-09T09:32:08.171125719Z caller=blocks_cleaner.go:58 level=info msg="cleaning of blocks marked for deletion done"

Anything else we need to know:

Environment:

  • openshift4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions