Skip to content

prometheus_remote_write drops metrics older than 1 hour #11405

@asg7443

Description

@asg7443

Bug Report

Describe the bug
prometheus_remote_write output plugin drops metrics older than 1 hour

To Reproduce

  • Create directories
mkdir -p /srv/logshipper/{conf,data,log}
  • Create config which enables file system storage and simulates an unreachable server
---
# /srv/logshipper/conf/fluent-bit.yaml
service:
  log_level: trace
  log_file: /fluent-bit/log/main.log
  flush: 300
  storage.path: /fluent-bit/data/storage
  scheduler.base: 60
  scheduler.cap: 3600

pipeline:
  inputs:
    - name: node_exporter_metrics
      tag:  node_metrics
      metrics: "loadavg"
      scrape_interval: 60
      storage.type: filesystem

  outputs:
    - name: prometheus_remote_write
      match: node_metrics
      host: unreachable.test

      storage.total_limit_size: 100M
      retry_limit: 1000
  • Run fluent-bit
    podman run --rm -d -v /srv/logshipper/conf:/fluent-bit/etc:ro -v /srv/logshipper/data:/fluent-bit/data:rw -v /srv/logshipper/log:/fluent-bit/log:rw docker.io/fluent/fluent-bit:4.2.2 -c /fluent-bit/etc/fluent-bit.yaml

  • Wait for 1.5-2 hours

  • See the chunk files being deleted from /srv/logshipper/data with messages in /srv/logshipper/log/main.log like these:

[2026/01/27 11:16:45.224905057] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=1 assigned to thread #1
[2026/01/27 11:16:45.224971777] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 2110
[2026/01/27 11:16:45.225003737] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-422 payload_size=0
[2026/01/27 11:16:45.225012738] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=1 decoded 422-844 payload_size=0
[2026/01/27 11:16:45.225024378] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=2 decoded 844-1266 payload_size=0
[2026/01/27 11:16:45.225031738] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=3 decoded 1266-1688 payload_size=0
[2026/01/27 11:16:45.225040218] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=4 decoded 1688-2110 payload_size=0
[2026/01/27 11:16:45.225042938] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 0
[2026/01/27 11:16:45.225058978] [debug] [out flush] cb_destroy coro_id=43
[2026/01/27 11:16:45.225092898] [ info] [engine] flush chunk '1-1769507997.474870989.flb' succeeded at retry 7: task_id=1, input=node_exporter_metrics.0 > output=prometheus_remote_write.0 (out_id=0)
[2026/01/27 11:16:45.225102658] [debug] [task] destroy task=0xfc3a1fa23960 (task_id=1)
[2026/01/27 11:16:45.225109058] [debug] [input chunk] remove chunk 1-1769507997.474870989.flb with 4096 bytes from plugin prometheus_remote_write.0, the updated fs_chunks_size is 65536 bytes

Expected behavior
Chunk files are not deleted and being kept for as long as it's configured with storage.total_limit_size and retry_limit parameters

Your Environment

  • Version used: 4.2.2, official Docker image

Additional context
It makes it impossible to use Fluent Bit as a temporary buffer in the event of a prolonged server outage. Old metrics in this scenario are lost.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions