Open
Description
Describe the bug
Hi Team,
We observed that when Fluentd agent is overwhelmed with "pattern not matched:" warning after some time it gets stuck and stopped processing. Both tailing and flushing buffers are stuck.
To Reproduce
To reproduce this issue we run an experiment on shared configuration.
Case1:
Fluentd agent running fine and able to process incoming logs file heartbeat.log.
Case 2:
In addition to case 1, we started appending logs to file pattern_not_match.log which will result in pattern not match warning and overwhelmed files with new logs. After some time agent got stuck and stopped processing logs from the heartbeat.log file.
we stopped adding log line to pattern_not_match.log but the agent didn't recover from the stuck state.
Case3: manually commented https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L640 to stop printing pattern not match warning, and restarted agent. and repeated case 2. The agent was running fine and pattern not warning not impacting fluentd now.
For each case, the sig dump is attached.
Case 1 : 6:55 AM to 8:10AM
Case 2: 8:10 AM to 9:15AM
case 3 : 9:15 AM to continue
Sigdump atteched below.
case 1
[root@kashokum-downloader tmp]# ps -ef | grep unified
root 2534 1 0 06:40 ? 00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root 2537 2534 0 06:40 ? 00:00:02 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor
case 2
[root@kashokum-downloader opt]# ps -ef | grep unified
root 22200 1 0 08:10 ? 00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root 22203 22200 39 08:10 ? 00:21:00 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor
sigdump-22200.log
sigdump-22203.log
case 3
[root@kashokum-downloader tmp]# ps -ef | grep unified
root 3975 1 0 09:15 ? 00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root 3978 3975 8 09:15 ? 00:05:46 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor
sigdump-3975.log
sigdump-3978.log
Expected behavior
The agent should not go into a stuck state.
Why print log line with pattern not match warning resulting in this issue? @ashie can you look into this?
https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L640
Your Environment
- Fluentd version: 1.14.4
- TD Agent version:
- Operating system: Linux
- Kernel version:
Your Configuration
<source>
@type tail
tag 6c6561.ashoklinux2heartbeat
path /var/log/heartbeat/hosts/*/heartbeat.log
pos_file /etc/unifiedmonitoringagent/pos/6c6561-ashoklinux2heartbeat.pos
path_key tailed_path
<parse>
@type none
types {}
keep_time_key false
null_empty_string false
estimate_current_event true
timeout 0
</parse>
</source>
<match 6c6561.**>
@type xxx-xxx
log_object_id testid
<buffer tag>
@type file
retry_timeout 3h
path /opt/unifiedmonitoringagent/run/buffer/6c6561
disable_chunk_backup true
chunk_limit_size 5MB
flush_interval 60s
total_limit_size 1GB
overflow_action throw_exception
retry_type exponential_backoff
</buffer>
</match>
<source>
@type tail
tag 687861.patternnotmatch
path /var/log/pattern_not_match/pattern_not_match.log
pos_file /etc/unifiedmonitoringagent/pos/687861-patternnotmatch.pos
path_key tailed_path
<parse>
@type none
types {}
keep_time_key false
null_empty_string false
estimate_current_event false
timeout 0
</parse>
</source>
<match 687861.**>
@type oci_logging
log_object_id ocid1.log.oc1.phx.amaaaaaa4p7m3syaoyyu65n7x5l6tgcjd3bls74vhlwl6pumxooizrax3g3q
<buffer tag>
@type file
retry_timeout 3h
path /opt/unifiedmonitoringagent/run/buffer/687861
disable_chunk_backup true
chunk_limit_size 5MB
flush_interval 180s
total_limit_size 1GB
overflow_action throw_exception
retry_type exponential_backoff
</buffer>
</match>
Your Error Log
No error log, sigdump attached
Additional context
No response