Skip to content

Fluentd agent completely stuck due to printing "pattern not match warning" #3949

Open
@kashok7474

Description

@kashok7474

Describe the bug

Hi Team,

We observed that when Fluentd agent is overwhelmed with "pattern not matched:" warning after some time it gets stuck and stopped processing. Both tailing and flushing buffers are stuck.

To Reproduce

To reproduce this issue we run an experiment on shared configuration.

Case1:
Fluentd agent running fine and able to process incoming logs file heartbeat.log.
Case 2:
In addition to case 1, we started appending logs to file pattern_not_match.log which will result in pattern not match warning and overwhelmed files with new logs. After some time agent got stuck and stopped processing logs from the heartbeat.log file.
we stopped adding log line to pattern_not_match.log but the agent didn't recover from the stuck state.
Case3: manually commented https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L640 to stop printing pattern not match warning, and restarted agent. and repeated case 2. The agent was running fine and pattern not warning not impacting fluentd now.

For each case, the sig dump is attached.

Screenshot 2022-11-07 at 3 50 38 PM

Case 1 : 6:55 AM to 8:10AM
Case 2: 8:10 AM to 9:15AM
case 3 : 9:15 AM to continue

Sigdump atteched below.
case 1

[root@kashokum-downloader tmp]# ps -ef | grep unified
root      2534     1  0 06:40 ?        00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root      2537  2534  0 06:40 ?        00:00:02 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor

sigdump-2534.log

sigdump-2537.log

case 2

[root@kashokum-downloader opt]# ps -ef | grep unified
root     22200     1  0 08:10 ?        00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root     22203 22200 39 08:10 ?        00:21:00 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor

sigdump-22200.log
sigdump-22203.log

case 3

[root@kashokum-downloader tmp]# ps -ef | grep unified
root      3975     1  0 09:15 ?        00:00:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root      3978  3975  8 09:15 ?        00:05:46 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor

sigdump-3975.log
sigdump-3978.log

Expected behavior

The agent should not go into a stuck state.
Why print log line with pattern not match warning resulting in this issue? @ashie can you look into this?
https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L640

Your Environment

- Fluentd version: 1.14.4
- TD Agent version:
- Operating system: Linux
- Kernel version:

Your Configuration

<source>
     @type tail
     tag 6c6561.ashoklinux2heartbeat
     path  /var/log/heartbeat/hosts/*/heartbeat.log
     pos_file  /etc/unifiedmonitoringagent/pos/6c6561-ashoklinux2heartbeat.pos
     path_key  tailed_path
     <parse>
         @type none
         types  {}
         keep_time_key  false
         null_empty_string  false
         estimate_current_event  true
         timeout  0
     </parse>
 </source>

 <match 6c6561.**>
     @type xxx-xxx
     log_object_id  testid
     <buffer tag>
         @type file
         retry_timeout  3h
         path  /opt/unifiedmonitoringagent/run/buffer/6c6561
         disable_chunk_backup  true
         chunk_limit_size  5MB
         flush_interval  60s
         total_limit_size  1GB
         overflow_action  throw_exception
         retry_type  exponential_backoff
     </buffer>
 </match>
 <source>
     @type tail
     tag 687861.patternnotmatch
     path  /var/log/pattern_not_match/pattern_not_match.log
     pos_file  /etc/unifiedmonitoringagent/pos/687861-patternnotmatch.pos
     path_key  tailed_path
     <parse>
         @type none
         types  {}
         keep_time_key  false
         null_empty_string  false
         estimate_current_event  false
         timeout  0
     </parse>
 </source>

 <match 687861.**>
     @type oci_logging
     log_object_id  ocid1.log.oc1.phx.amaaaaaa4p7m3syaoyyu65n7x5l6tgcjd3bls74vhlwl6pumxooizrax3g3q
     <buffer tag>
         @type file
         retry_timeout  3h
         path  /opt/unifiedmonitoringagent/run/buffer/687861
         disable_chunk_backup  true
         chunk_limit_size  5MB
         flush_interval  180s
         total_limit_size  1GB
         overflow_action  throw_exception
         retry_type  exponential_backoff
     </buffer>
 </match>

Your Error Log

No error log, sigdump attached

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions