Description
Describe the bug
If the worker process receives a TERM signal between:
https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L603
and
https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L615
The signal will be lost and the process will not terminate. This comes up when using systemd if the fluentd is started and then stopped very soon after. In this case systemd will not send another signal to clean up for a configurable amount of time and the process will hang in a "deactivating" state.
To Reproduce
It is difficult to reproduce because it is a very tight race (however we see it in practice multiple times a month), but if you put a sleep statement here: https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L604 and then send a TERM signal while it is in that sleep you will see it.
Expected behavior
I would expect the process to shutdown
Your Environment
- Fluentd version: v1.13.3 and v1.16.1
- TD Agent version: N/A
- Operating system: Linux
- Kernel version: 5.15.0
Your Configuration
<system>
log_level debug
</system>
Your Error Log
023-07-06 20:32:45 +0000 [info]: starting fluentd-1.13.3 pid=45636 ruby="3.1.2"
2023-07-06 20:32:45 +0000 [info]: spawn command to main: cmdline=["/opt/unified-monitoring-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/unified-monitoring-agent/embedded/bin/fluentd", "--log", "/var/log/unified-monitoring-agent/unified-monitoring-agent.log", "--log-rotate-size", "1048576", "--log-rotate-age", "10", "--under-supervisor"]
2023-07-06 20:32:46 +0000 [info]: #0 waiting to setup signals
2023-07-06 20:32:56 +0000 [info]: #0 setting up signals
2023-07-06 20:32:56 +0000 [info]: #0 setting up handlers
2023-07-06 20:32:56 +0000 [info]: #0 waiting after signal setup
2023-07-06 20:34:07 +0000 [info]: #0 done waiting
2023-07-06 20:34:07 +0000 [debug]: #0 No fluent logger for internal event
2023-07-06 20:34:07 +0000 [info]: #0 starting fluentd worker pid=45650 ppid=45636 worker=0
2023-07-06 20:34:07 +0000 [info]: #0 fluentd worker is now running worker=0
2023-07-06 20:39:11 +0000 [info]: fluent/log.rb:330:info: Received graceful stop
2023-07-06 20:39:11 +0000 [info]: fluent/log.rb:330:info: Received graceful stop
2023-07-06 20:39:12 +0000 [info]: #0 fluentd worker is now stopping worker=0
2023-07-06 20:39:12 +0000 [info]: #0 shutting down fluentd worker worker=0
2023-07-06 20:39:12 +0000 [info]: fluent/log.rb:330:info: Worker 0 finished with status 0
2023-07-06 20:39:14 +0000 [info]: parsing config file is succeeded path="/etc/unified-monitoring-agent/unified-monitoring-agent.conf"
2023-07-06 20:39:14 +0000 [info]: gem 'fluentd' version '1.13.3'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-cri-parser' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-nested-json-parser' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-netflowipfix' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-out-datamesh' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-common' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-common-public' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-grok-parser' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-in-agent-metrics' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-in-g1gc' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-in-openmetrics' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-in-openmetrics-k8s' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-metrics-formatter' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-out-logging' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-out-lumberjack' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-out-monitoring' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-out-objectstorage' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-oracle-telemetry-out-telemetry' version '1.0.20230222164221'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-07-06 20:39:14 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.3.0'
2023-07-06 20:39:14 +0000 [debug]: No fluent logger for internal event
2023-07-06 20:39:14 +0000 [info]: using configuration file: <ROOT>
<system>
log_level debug
</system>
</ROOT>
2023-07-06 20:39:14 +0000 [info]: starting fluentd-1.13.3 pid=47009 ruby="3.1.2"
2023-07-06 20:39:14 +0000 [info]: spawn command to main: cmdline=["/opt/unified-monitoring-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/unified-monitoring-agent/embedded/bin/fluentd", "--log", "/var/log/unified-monitoring-agent/unified-monitoring-agent.log", "--log-rotate-size", "1048576", "--log-rotate-age", "10", "--under-supervisor"]
2023-07-06 20:39:16 +0000 [info]: #0 waiting to setup signals
2023-07-06 20:39:26 +0000 [info]: #0 setting up signals
2023-07-06 20:39:26 +0000 [info]: #0 setting up handlers
2023-07-06 20:39:26 +0000 [info]: #0 waiting after signal setup
2023-07-06 20:39:32 +0000 [info]: fluent/log.rb:330:info: Received graceful stop
2023-07-06 20:39:32 +0000 [info]: fluent/log.rb:330:info: Received graceful stop
^C
[root@delete-me unified-monitoring-agent]# systemctl status unified-monitoring-agent | cat
● unified-monitoring-agent.service - unified-monitoring-agent: Fluentd based data collector for Oracle Cloud Infrastructure
Loaded: loaded (/usr/lib/systemd/system/unified-monitoring-agent.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) since Thu 2023-07-06 20:39:32 GMT; 12s ago
Docs: https://docs.cloud.oracle.com/
Process: 47121 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Main PID: 47009 (fluentd)
Tasks: 4 (limit: 99872)
Memory: 68.7M (limit: 5.0G)
CGroup: /system.slice/unified-monitoring-agent.service
├─47009 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --log-rotate-size 1048576 --log-rotate-age 10
└─47016 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor
Additional context
No response