Description
Describe the bug
Sometimes (rarely) situation happens where fluentd buffer contains chunks which is empty and usually .meta file missing. If we have only one worker/thread + retry_forever then fluentd repeatedly tries to flush that empty chunk fails with
2025-04-10 16:33:04 +0000 [warn]: [opensearch] failed to flush the buffer. retry_times=32 next_retry_time=2025-04-10 16:33:33 +0000 chunk="6324310443eec1c01824af9ad547fba4" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster (system): [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}"
this blocks whole buffer and will not fix until we manually remove the empty chunk. Buffer section config is like following.
<buffer>
@type file
path /var/log/fluentd-buffers/system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 1
flush_interval 15s
retry_forever
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
total_limit_size "#{ENV['OUTPUT_BUFFER_TOTAL_LIMIT']}"
overflow_action block
</buffer>
We know retry_forever + 'overflow_action block' + 'flush_thread_count 1' makes empty chunk never discarded and block logs but we do not wanna use retry_max_times which would drop the empty chunk, because it is not gonna drop chunks when chunks bad it would also be dropped in OpenSearch can't accept logs/down, we can drop chunks only if it is chunk which has issue. Can you recommend any solution?
To Reproduce
running following in buffer folder can help to re-produce the issue but it is not always happening, usually need to run it multiple times to get reproduction
file=$(ls buffer.*.log | shuf -n1); echo "Selected: $file"; rm -v "${file}.meta"; truncate -s 0 "$file"
Expected behavior
it would be nice if fluentd can detect empty chunk and discard it
Your Environment
- Fluentd version: 1.18
- Operating system: Debian 12
- Kernel version: 5.15.0-102-generic
Your Configuration
<buffer>
@type file
path /var/log/fluentd-buffers/system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 1
flush_interval 15s
retry_forever
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
total_limit_size "#{ENV['OUTPUT_BUFFER_TOTAL_LIMIT']}"
overflow_action block
</buffer>
Your Error Log
2025-04-10 16:33:04 +0000 [warn]: [opensearch] failed to flush the buffer. retry_times=32 next_retry_time=2025-04-10 16:33:33 +0000 chunk="6324310443eec1c01824af9ad547fba4" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster (system): [400] {\"error\":{\"root_cause\":[{\"type\":\"parse_exception\",\"reason\":\"request body is required\"}],\"type\":\"parse_exception\",\"reason\":\"request body is required\"},\"status\":400}"
Additional context
No response