Skip to content

Forward input partially missing data: getting Error "could not write forward header" #10079

Open
@vpshibin

Description

@vpshibin

Bug Report

Describe the bug
I'm using fluent-bit in a forwarder-aggregator pattern. Data is being collected from Windows VMs, and aggregator is Linux VM (2 of them behind a load balancer).

In the source fluent-bit (forwarder) logs, I'm seeing the below error. And data is missing partially (about 70% of the data is not reaching destination).

[2025/03/14 15:45:40] [debug] [output:forward:forward.1] task_id=15 assigned to thread #0
[2025/03/14 15:45:40] [debug] [output:forward:forward.1] request 424 bytes to flush
[2025/03/14 15:45:40] [debug] [retry] re-using retry for task_id=15 attempts=5
[2025/03/14 15:45:40] [debug] [upstream] KA connection #4476 to 10.14.114.180:5174 has been assigned (recycled)
[2025/03/14 15:45:40] [ warn] [engine] failed to flush chunk '3096-1741927459.414588600.flb', retry in 71 seconds: task_id=15, input=tail.1 > output=forward.1 (out_id=1)
[2025/03/14 15:45:40] [error] [output:forward:forward.1] could not write forward header

I have a very high (1000s) number of VMs sending data to each aggregator. However there is no saturation in aggregator CPU (about 10% used usually) or Memory (about 30% free memory during peak times also).

My retry_limit is 5, so after 5 attempts the chunk is dropped.
[2025/03/14 15:45:40] [debug] [upstream] KA connection #4476 to 10.14.114.180:5174 is now available
[2025/03/14 15:45:40] [debug] [out flush] cb_destroy coro_id=3337
[2025/03/14 15:45:41] [debug] [output:forward:forward.1] task_id=18 assigned to thread #1
[2025/03/14 15:45:41] [debug] [task] task_id=18 reached retry-attempts limit 5/5

On the aggregator (receiving) side, I see the below errors, but not sure if they are related.
[2025/03/14 16:07:34] [error] [input:forward:forward.1] could not enqueue records into the ring buffer

To Reproduce

  • Use fluent bit in a forwarder/aggregator pattern using forward input/output. About 1000 forwarder instances connected to fluent bit acting as aggregator.

  • Steps to reproduce the problem:

Expected behavior
The forward outputs should be working to send the data to the aggregator.

Screenshots

Your Environment

  • Version used: Forwarder (3.1.7), Aggregator (3.2.1)
  • Configuration:
    Forwarder:
    [OUTPUT]
    Name Forward
    Match EdgeClient
    Host x.x.x.x
    Port 5174
    storage.total_limit_size 5M
    tls Off
    net.connect_timeout 5
    net.keepalive On
    net.keepalive_idle_timeout 30
    net.max_worker_connections 1
    Retry_Limit 5

Aggregator:
- name: forward
listen: 0.0.0.0
port: 5174
Tag_Prefix: Logs-
Threaded: true
storage.type: filesystem
tls: off
tls.verify: off

  processors:
    logs:
      - name: content_modifier
        action: insert
        key: AggNbr
        value: ${AGG_NBR}
  • Environment name and version (e.g. Kubernetes? What version?):
    Application logs from VMs.

  • Server type and version:
    Virtual machines, forwarders (on-prem)
    Aggregators (Azure cloud) - 2* Standard_B4s_v2 (4 vCPUs, 16 GB RAM each)

  • Operating System and version:
    Forwarders: Windows 10 Enterprise LTSC 2019
    Aggregators: Linux RedHat 8.10

  • Filters and plugins:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions