Fluentd forwarder status page is displayed with a huge delay when aggregator node is responding slowly

OS: centos 7
Fluentd version: td-agent-3.2.0-0.el7.x86_64

When aggregator node is failing or responding very slowly while under heavy load, it might take up to 1-2 minutes to get a status page  /api/plugins.json on a forwarder node. 

Steps to reproduce

Forwarder config

```
<source>
  @type monitor_agent
  bind 127.0.0.1
  port 24220
</source>

<source>
  @type forward
  bind 127.0.0.1
  port 24224
</source>

<match **>
  @type forward

  heartbeat_type tcp
  send_timeout 60s
  recover_wait 10s
  heartbeat_interval 1s
 # increased this while testing 
  phi_threshold 160000
  hard_timeout 120s

  <server>
    name logs1
    host 172.31.3.5
    port 8889
    weight 60
  </server>

  flush_interval 10s

  buffer_type file
  buffer_path /var/log/fluentd/buffer/forward
  buffer_chunk_limit 4m
  buffer_queue_limit 4096
  num_threads 2
  expire_dns_cache 600
</match>
```

I make some service send logs to the forwarder. 

Then on aggregator node I execute 

`# iptables -A INPUT -m statistic --mode random --probability 0.8 --source forwarder.node.ip.address -j DROP`

On the forwarder node I execute the following curl request in a loop

`# while true; do timeout 2 curl -s http://localhost:24220/api/plugins.json > /dev/null  && echo ok || echo failure; sleep 1; done
`
In some time it starts showing "failure".

When I flush iptables rules on the aggregator node with 

`iptables -F`

it gets back to normal. 

It happens not all the time, but in a rather big percentage of cases it happens. 

td-agent 2.5 is not affected. 

Also I noticed that docker services that send logs to the forwarder stop responding sometimes as well. But was not able to reproduce it yet in my test environment. 


Thanks.

Regards,
Sergey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fluentd forwarder status page is displayed with a huge delay when aggregator node is responding slowly #2137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fluentd forwarder status page is displayed with a huge delay when aggregator node is responding slowly #2137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions