Replies: 4 comments 5 replies
-
Several similar problems with K8s have been reported.
Basically, I recognize this as follows.
Two solutions should be considered for this.
If there seems to be a K8s-specific problem or a Fluentd bug related to it, please let us know. |
Beta Was this translation helpful? Give feedback.
-
https://docs.fluentd.org/configuration/buffer-section#buffering-parameters
However, it is weird that this occurs with only 7-8 MB... |
Beta Was this translation helpful? Give feedback.
-
I am also struggle with this problem. In the beginning, everything works like a charm. I see my desired logs in Opensearch and the resource usage everywhere is at minimum. It seems, that the output plugin has a issue in the retry or reconnect mechanism. I see a lot of output retry&errors in the metrics. The buffer usage is always rising, never decreasing, which indicates that the output is blocked. Healing is only possible by restarting the fluentd pods. Here is my Metric: These are my settings:
|
Beta Was this translation helpful? Give feedback.
-
I have analysed a little further. Perhaps it's simply not a buffer problem, but really a connection issue on the Alpine image itself. I reduced my log input, which didn't bring any significant solution. While my previous error "buffer space has too many data" doesn't occur anymore, the outcome is still the same. After a while, fluentd stops sending logs to opensearch. Right now, i am struggling with this issue: fluent/fluent-plugin-opensearch#147 What I also find interesting is that when I connect to the fluentd container via exec shell, at the time when the buffer queue rises, and send a request to my Opensearch cluster, I actually get a 503.
After restarting, the communication looks fine:
So it seems to be a problem on the OS layer and not on the plugin itself. |
Beta Was this translation helpful? Give feedback.
-
What is a problem?
We are running fluentd on our k8s clusters to forward the application logs to our Splunk instance. To handle Splunk downtime we configured a buffer in fluentd , however on a few clusters that generate a lot of logs (~180'000 entries per 15min) after running the fluentd pods for a while, a lot of "buffer space has too many data" errors start appearing. We tried different buffer configurations, from memory buffer with 8m chunk_limit_size to file buffer with 256m chunk_limit_size, flush_thread_count from 1 to 8 and a flush_interval of 5s. The problem seems to persist through all configurations .
I tried to observe the actual size of the logs coming in by watching the size of the buffer directory and interestingly, it never seems to grow past 7-8 MB in size before the flush happens. So I don't understand how a "too many data" error can occur when there is 8 MB of data and a chunk_limit_size of 256m is configured.
Interestingly when I restart the fluentd pods the errors seem to vanish for up to 1 day before starting again.
Describe the configuration of Fluentd
Describe the logs of Fluentd
Environment
Beta Was this translation helpful? Give feedback.
All reactions