Skip to content

Fluentd forwarder status page is displayed with a huge delay when aggregator node is responding slowly #2137

Open
@sergeyarl

Description

@sergeyarl

OS: centos 7
Fluentd version: td-agent-3.2.0-0.el7.x86_64

When aggregator node is failing or responding very slowly while under heavy load, it might take up to 1-2 minutes to get a status page /api/plugins.json on a forwarder node.

Steps to reproduce

Forwarder config

<source>
  @type monitor_agent
  bind 127.0.0.1
  port 24220
</source>

<source>
  @type forward
  bind 127.0.0.1
  port 24224
</source>

<match **>
  @type forward

  heartbeat_type tcp
  send_timeout 60s
  recover_wait 10s
  heartbeat_interval 1s
 # increased this while testing 
  phi_threshold 160000
  hard_timeout 120s

  <server>
    name logs1
    host 172.31.3.5
    port 8889
    weight 60
  </server>

  flush_interval 10s

  buffer_type file
  buffer_path /var/log/fluentd/buffer/forward
  buffer_chunk_limit 4m
  buffer_queue_limit 4096
  num_threads 2
  expire_dns_cache 600
</match>

I make some service send logs to the forwarder.

Then on aggregator node I execute

# iptables -A INPUT -m statistic --mode random --probability 0.8 --source forwarder.node.ip.address -j DROP

On the forwarder node I execute the following curl request in a loop

# while true; do timeout 2 curl -s http://localhost:24220/api/plugins.json > /dev/null && echo ok || echo failure; sleep 1; done
In some time it starts showing "failure".

When I flush iptables rules on the aggregator node with

iptables -F

it gets back to normal.

It happens not all the time, but in a rather big percentage of cases it happens.

td-agent 2.5 is not affected.

Also I noticed that docker services that send logs to the forwarder stop responding sometimes as well. But was not able to reproduce it yet in my test environment.

Thanks.

Regards,
Sergey

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingv1

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions