Description
Bug Report
Describe the bug
I'm using fluent-bit in a forwarder-aggregator pattern. Data is being collected from Windows VMs, and aggregator is Linux VM (2 of them behind a load balancer).
In the source fluent-bit (forwarder) logs, I'm seeing the below error. And data is missing partially (about 70% of the data is not reaching destination).
[2025/03/14 15:45:40] [debug] [output:forward:forward.1] task_id=15 assigned to thread #0
[2025/03/14 15:45:40] [debug] [output:forward:forward.1] request 424 bytes to flush
[2025/03/14 15:45:40] [debug] [retry] re-using retry for task_id=15 attempts=5
[2025/03/14 15:45:40] [debug] [upstream] KA connection #4476 to 10.14.114.180:5174 has been assigned (recycled)
[2025/03/14 15:45:40] [ warn] [engine] failed to flush chunk '3096-1741927459.414588600.flb', retry in 71 seconds: task_id=15, input=tail.1 > output=forward.1 (out_id=1)
[2025/03/14 15:45:40] [error] [output:forward:forward.1] could not write forward header
I have a very high (1000s) number of VMs sending data to each aggregator. However there is no saturation in aggregator CPU (about 10% used usually) or Memory (about 30% free memory during peak times also).
My retry_limit is 5, so after 5 attempts the chunk is dropped.
[2025/03/14 15:45:40] [debug] [upstream] KA connection #4476 to 10.14.114.180:5174 is now available
[2025/03/14 15:45:40] [debug] [out flush] cb_destroy coro_id=3337
[2025/03/14 15:45:41] [debug] [output:forward:forward.1] task_id=18 assigned to thread #1
[2025/03/14 15:45:41] [debug] [task] task_id=18 reached retry-attempts limit 5/5
On the aggregator (receiving) side, I see the below errors, but not sure if they are related.
[2025/03/14 16:07:34] [error] [input:forward:forward.1] could not enqueue records into the ring buffer
To Reproduce
- Use fluent bit in a forwarder/aggregator pattern using forward input/output. About 1000 forwarder instances connected to fluent bit acting as aggregator.
- Steps to reproduce the problem:
Expected behavior
The forward outputs should be working to send the data to the aggregator.
Screenshots
Your Environment
- Version used: Forwarder (3.1.7), Aggregator (3.2.1)
- Configuration:
Forwarder:
[OUTPUT]
Name Forward
Match EdgeClient
Host x.x.x.x
Port 5174
storage.total_limit_size 5M
tls Off
net.connect_timeout 5
net.keepalive On
net.keepalive_idle_timeout 30
net.max_worker_connections 1
Retry_Limit 5
Aggregator:
- name: forward
listen: 0.0.0.0
port: 5174
Tag_Prefix: Logs-
Threaded: true
storage.type: filesystem
tls: off
tls.verify: off
processors:
logs:
- name: content_modifier
action: insert
key: AggNbr
value: ${AGG_NBR}
-
Environment name and version (e.g. Kubernetes? What version?):
Application logs from VMs. -
Server type and version:
Virtual machines, forwarders (on-prem)
Aggregators (Azure cloud) - 2* Standard_B4s_v2 (4 vCPUs, 16 GB RAM each) -
Operating System and version:
Forwarders: Windows 10 Enterprise LTSC 2019
Aggregators: Linux RedHat 8.10 -
Filters and plugins:
Additional context