Skip to content

Thanos-compactor crashed while pushing out-of-order labels #212

Open
@SushilSanjayBhile

Description

@SushilSanjayBhile

After using this ObjectStore spec, thanos-compactor pod started crashing..

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
  name: spektra-thanos
spec:
  config:
    mountFrom:
      secretKeyRef:
        name: spektra-thanos-objectstore
        key: spektra-thanos-s3-config.yaml
  bucketWeb:
    deploymentOverrides:
      spec:
        template:
          spec:
            containers:
              - name: bucket
                image: quay.io/thanos/thanos:v0.26.0
  compactor:
    retentionResolutionRaw: 5270400s # 61d
    retentionResolution5m: 5270400s # 61d
    retentionResolution1h: 5270400s # 61d
    dataVolume:
      pvc:
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: MINIO_STORAGE_SIZE
    deploymentOverrides:
      spec:
        template:
          spec:
            containers:
              - name: compactor
                image: quay.io/thanos/thanos:v0.26.0

Here are some logs from thanos-compactor pod:-

level=warn ts=2023-11-09T07:00:36.915552097Z caller=index.go:267 group="0@{receive_replica=\"spektra-thanos-receiver-soft-tenant-0\", tenant_id=\"t1\"}" groupKey=0@12308314071310558143 msg="**out-of-order label set: known bug in Prometheus 2.8.0 and below**" labelset="{__measurement__=\"kubernetes_pod_volume\", __name__=\"fs_used_bytes\", app=\"unknown\", claim_name=\"unknown\", cluster=\"devtb7\", kubernetes_io_config_seen=\"2023-10-20T07:07:41.743485600-07:00\", kubernetes_io_config_source=\"api\", name=\"mongodb-kubernetes-operator\", namespace=\"spektra-system\", node_name=\"appserv85\", pod_template_hash=\"598cb5f96\", pod_name=\"mongodb-kubernetes-operator-598cb5f96-9t4nl\", project=\"unknown\", source=\"t1|devtb7|appserv85\", tenant=\"t1\", volume_name=\"kube-api-access-gdtk4\"}" series=10821
level=warn ts=2023-11-09T07:00:36.916184335Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.916226751Z caller=http.go:84 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.917473056Z caller=http.go:103 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.917542971Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=error ts=2023-11-09T07:00:36.917788817Z caller=main.go:158 err="group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, **try running with --debug.accept-malformed-index**: index contains 1207 postings with out of order labels\ncompaction\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:422\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:476\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:475\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nerror executing compaction\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:503\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:475\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:158\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
sushil@batpod:~/Diamanti/gitrepo/helm$ 

In this we can see pod crashed because of: out of order labels
Warning also mentions that: this is known issue in prometheus 2.8.0 and below version
Logs also states that use: --debug.accept-malformed-index to avoid this issue, but in above objectstore we don't have option to specify this flag. Also I went through code and I can see that this deployment.go file as well don't have above flag.

Please add an option in either apiVersion: monitoring.banzaicloud.io/v1alpha1 kind: ObjectStore or please add that flag here somewhere: https://github.com/banzaicloud/thanos-operator/blob/pkg/sdk/v0.3.7/pkg/resources/compactor/deployment.go#L54

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions