Description
Describe the bug
Running as fluentd-kubernetes-daemonset on modest EKS clusters (6-12 nodes, *.large instances) each fluentd pod steadily grows in memory use until it hits allocated memory limit (currently 750MB) and restarts (timescale ~ 1 week).
Occurs in multiple clusters with different workloads, different message formats and different patterns of usage, even very low use instances.
No obvious problems in fluentd logs. No obvious correlation to particular log messages or usage pattern.
To Reproduce
Deploy fluentd-kubernetes-daemonset 1.7.1 with matching debian-s3 and debian-elasticsearch8 backends, tail parser plugin and small number of match rules for json log formats. Watch memory use via cluster metrics.
Expected behavior
Memory use should stabilise below levels anything like 750MB per instance.
Your Environment
- Fluentd version: 1.7.1
- Package version:
- Operating system: k8s 1.31 with AL2023 nodes (Amazon Linux 2023.6.20250115)
- Kernel version: 6.1.119-129.201.amzn2023.x86_64
Your Configuration
<match kubernetes.log.hydrology-data-explorer.**>
@id json_hydrology_data_explorer
@type rewrite_tag_filter
<rule>
key stream
pattern ^(.+)$
tag kubernetes.log.json.$1
</rule>
</match>
<match kubernetes.log.hydro-api.**>
@id json_hydro_api
@type rewrite_tag_filter
<rule>
key stream
pattern ^(.+)$
tag kubernetes.log.json.$1.ts
</rule>
</match>
Your Error Log
Most instances have few logs. Others have (successful) retries on pushing to elasticsearch backend once or twice a day :
2025-04-26 12:45:25 +0000 [warn]: #0 [out_es7] failed to flush the buffer. retry_times=0 next_retry_time=2025-04-26 12:45:27 +0000 chunk="633add28dae5764171d00d39ffc11934" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch-master\", :port=>9200, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): read timeout reached"
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1148:in `rescue in send_bulk'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1110:in `send_bulk'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:886:in `block in write'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `each'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `write'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2025-04-26 12:45:25 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2025-04-26 12:45:26 +0000 [warn]: #0 [out_es7] retry succeeded. chunk_id="633add2e01810b613843ebaa2caa91aa"
2025-04-26 15:06:06 +0000 [warn]: #0 [out_es7] failed to flush the buffer. retry_times=0 next_retry_time=2025-04-26 15:06:07 +0000 chunk="633afc9ad3b10c13ec620ea3304c84d8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch-master\", :port=>9200, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): read timeout reached"
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1148:in `rescue in send_bulk'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1110:in `send_bulk'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:886:in `block in write'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `each'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `write'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2025-04-26 15:06:06 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2025-04-26 15:06:08 +0000 [warn]: #0 [out_es7] retry succeeded. chunk_id="633afc9f9bfdff286397141c43de483c"
2025-04-27 01:21:11 +0000 [warn]: #0 [out_es7] failed to flush the buffer. retry_times=0 next_retry_time=2025-04-27 01:21:12 +0000 chunk="633b861655fe9a1718104b9411c3f151" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch-master\", :port=>9200, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): read timeout reached"
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1148:in `rescue in send_bulk'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1110:in `send_bulk'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:886:in `block in write'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `each'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `write'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2025-04-27 01:21:11 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2025-04-27 01:21:13 +0000 [warn]: #0 [out_es7] retry succeeded. chunk_id="633b861b1ba05e2e41d3a813d04a3061"
2025-04-27 04:36:17 +0000 [warn]: #0 [out_es7] failed to flush the buffer. retry_times=0 next_retry_time=2025-04-27 04:36:18 +0000 chunk="633bb1b124e3e35e90244a93581b001c" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch-master\", :port=>9200, :scheme=>\"https\", :user=>\"elastic\", :password=>\"obfuscated\", :path=>\"\"}): read timeout reached"
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1148:in `rescue in send_bulk'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1110:in `send_bulk'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:886:in `block in write'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `each'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `write'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2025-04-27 04:36:17 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2025-04-27 04:36:18 +0000 [warn]: #0 [out_es7] retry succeeded. chunk_id="633bb1b6df310f483235bdf467bc487b"
2025-04-28 11:52:20 +0000 [warn]: #0 [in_tail_container_logs] /var/log/containers/publish-telemetry-15min-1745840160-ingest-telemetry-1370545147_hydro-production_init-a2d394ce175ea71c68eeffad5bd99d474efbde4c9599237ae658223f2cacf136.log unreadable. It is excluded and would be examined next time.
Additional context
Have attempted Ruby GC tuning in case that helped, including setting RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR
to 1.2
to no apparent effect.
Appreciate this is likely hard to reproduce, especially without matching cluster workloads, but we can't find a pattern that would provide a minimal complete test case. Just hoping this either matches a known past issue (that we've missed) or matches to other future reports.
Metadata
Metadata
Assignees
Type
Projects
Status