Description
Describe the bug
The helm chart doesn't have preStop lifecycle hooks to flush wal on shutdown for singleBinary, thus singleBinary does not flush data to object store when reducing number of replicas. Unlike the microservice pods, the singleBinary pods doesn't have lifecycle defined in helm templates or values.yaml.
Given the fact that the enableStatefulSetAutoDeletePVC is set to true, it's quite dangerous not to have prestop lifecycle hook for flushing or flush-on-shutdown enabled. If one manually scale down singleBinary using helm, the non-flushed wal will be gone with its deleted PVC. This is quite confusing and error-prone.
For the scalable or microservice pods, by default they only enable the lifecycle hooks when autoscaling is enabled. This config could also lead to data loss when one manually scales down the deployment.
The following is in write-statefulset.yaml
but not in single-binary/stateful.yaml
:
lifecycle:
{{- toYaml .Values.write.lifecycle | nindent 12 }}
{{- else if .Values.write.autoscaling.enabled }}
lifecycle:
preStop:
httpGet:
path: "/ingester/shutdown?terminate=false"
port: http-metrics
To Reproduce
Steps to reproduce the behavior:
- Install helm chart 6.29.0 with singleBinary replicas set to 3
- Have some logs pushed to loki
- Update helm chart to reduce replicas to 1
- Query for the logs and you'll find some of the logs are gone
Expected behavior
All logs shall be persistent.
Environment:
- Infrastructure: kubernetes (k3s v1.31.6+k3s1 (6ab750f9))
- Deployment tool: helm
Screenshots, Promtail config, or terminal output
My values.yaml:
deploymentMode: SingleBinary
singleBinary:
replicas: 1
extraArgs:
- -config.expand-env
extraEnvFrom:
- secretRef:
name: loki
loki:
commonConfig:
replication_factor: 1
storage:
bucketNames:
...
use_thanos_objstore: true
object_store:
type: s3
s3:
...
compactor:
retention_enabled: true
delete_request_store: s3
ingester:
wal:
flush_on_shutdown: true
limits_config:
retention_period: 200d
schemaConfig:
configs:
- from: 2022-01-11
store: boltdb-shipper
object_store: s3
schema: v12
index:
prefix: loki_index_
period: 24h
- from: 2024-06-17
store: boltdb-shipper
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
- from: 2024-06-18
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
storage_config:
use_thanos_objstore: true
lokiCanary:
extraArgs:
- -interval
- 10s
- -spot-check-query-rate
- 10m
test:
enabled: false
gateway:
affinity:
podAntiAffinity: null
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0
chunksCache:
enabled: false
resultsCache:
enabled: false
sidecar:
rules:
enabled: false