Description
After using this ObjectStore spec, thanos-compactor pod started crashing..
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
name: spektra-thanos
spec:
config:
mountFrom:
secretKeyRef:
name: spektra-thanos-objectstore
key: spektra-thanos-s3-config.yaml
bucketWeb:
deploymentOverrides:
spec:
template:
spec:
containers:
- name: bucket
image: quay.io/thanos/thanos:v0.26.0
compactor:
retentionResolutionRaw: 5270400s # 61d
retentionResolution5m: 5270400s # 61d
retentionResolution1h: 5270400s # 61d
dataVolume:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: MINIO_STORAGE_SIZE
deploymentOverrides:
spec:
template:
spec:
containers:
- name: compactor
image: quay.io/thanos/thanos:v0.26.0
Here are some logs from thanos-compactor pod:-
level=warn ts=2023-11-09T07:00:36.915552097Z caller=index.go:267 group="0@{receive_replica=\"spektra-thanos-receiver-soft-tenant-0\", tenant_id=\"t1\"}" groupKey=0@12308314071310558143 msg="**out-of-order label set: known bug in Prometheus 2.8.0 and below**" labelset="{__measurement__=\"kubernetes_pod_volume\", __name__=\"fs_used_bytes\", app=\"unknown\", claim_name=\"unknown\", cluster=\"devtb7\", kubernetes_io_config_seen=\"2023-10-20T07:07:41.743485600-07:00\", kubernetes_io_config_source=\"api\", name=\"mongodb-kubernetes-operator\", namespace=\"spektra-system\", node_name=\"appserv85\", pod_template_hash=\"598cb5f96\", pod_name=\"mongodb-kubernetes-operator-598cb5f96-9t4nl\", project=\"unknown\", source=\"t1|devtb7|appserv85\", tenant=\"t1\", volume_name=\"kube-api-access-gdtk4\"}" series=10821
level=warn ts=2023-11-09T07:00:36.916184335Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.916226751Z caller=http.go:84 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.917473056Z caller=http.go:103 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=info ts=2023-11-09T07:00:36.917542971Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, try running with --debug.accept-malformed-index: index contains 1207 postings with out of order labels"
level=error ts=2023-11-09T07:00:36.917788817Z caller=main.go:158 err="group 0@12308314071310558143: block id 01HD6WW51TDS9NTVG9XA8BWGE5, **try running with --debug.accept-malformed-index**: index contains 1207 postings with out of order labels\ncompaction\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:422\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:476\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:475\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\nerror executing compaction\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:503\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:75\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:475\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:158\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
sushil@batpod:~/Diamanti/gitrepo/helm$
In this we can see pod crashed because of: out of order labels
Warning also mentions that: this is known issue in prometheus 2.8.0 and below version
Logs also states that use: --debug.accept-malformed-index to avoid this issue, but in above objectstore we don't have option to specify this flag. Also I went through code and I can see that this deployment.go file as well don't have above flag.
Please add an option in either apiVersion: monitoring.banzaicloud.io/v1alpha1 kind: ObjectStore or please add that flag here somewhere: https://github.com/banzaicloud/thanos-operator/blob/pkg/sdk/v0.3.7/pkg/resources/compactor/deployment.go#L54