Description
Expected Behaviour
When we send a SIGTERM to fwatchdog
, it should be closed after fprocess
was gracefully shut down.
Current Behaviour
When we send a SIGTERM to fwatchdog
, it was still running after fprocess
was gracefully shut down(exit code of fprocess is 0).
In other case, if fprocess close with an error(exit code is not 0), watchdog will show a log Forked function has terminated: ...
and kill itself.
Possible Solution
Exit fwatchdog
itself when fprocess
exited. It should relate to https://github.com/openfaas-incubator/of-watchdog/blob/c796e1b714d703c90bf6e3f392471746e0aeab2d/executor/http_runner.go#L70-L76
A idea is exit fwatchdog whatever fprocess died with error or not, but I am not sure is there any side effect about this change.
Steps to Reproduce (for bugs)
I am using template python3-flask, and fprocess
is gunicorn
- Pull faas template:
faas template store pull python3-flask
- Generate a new project name python3-flask-gracefully-shutdown:
faas new python3-flask-gracefully-shutdown --lang python3-flask
- Add gunicorn to requirements.txt:
echo 'gunicorn' > python3-flask-gracefully-shutdown/requirements.txt
- Build images locally:
faas build -f python3-flask-gracefully-shutdown.yml
- Run container:
docker run --rm -ti -e fprocess='gunicorn index:app' -p 8080:8080 --name python3-flask-gracefully-shutdown python3-flask-gracefully-shutdown
- Send SIGTERM to the container:
docker kill --signal=SIGTERM python3-flask-gracefully-shutdown
- The log show that watchdog wait for 10 seconds(default timeout limit) to exit itself after it get SIGTERM, but it should be exited immediately.
Forking - gunicorn [index:app]
2020/06/30 05:50:27 Started logging stderr from function.
2020/06/30 05:50:27 Started logging stdout from function.
2020/06/30 05:50:27 OperationalMode: http
2020/06/30 05:50:27 Timeouts: read: 10s, write: 10s hard: 10s.
2020/06/30 05:50:27 Listening on port: 8080
2020/06/30 05:50:27 Writing lock-file to: /tmp/.lock
2020/06/30 05:50:27 Metrics listening on port: 8081
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Starting gunicorn 20.0.4
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Listening at: http://127.0.0.1:8000 (11)
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Using worker: sync
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [16] [INFO] Booting worker with pid: 16
2020/06/30 05:50:32 SIGTERM received.. shutting down server in 10s
2020/06/30 05:50:32 Removing lock-file : /tmp/.lock
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [11] [INFO] Handling signal: term
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [16] [INFO] Worker exiting (pid: 16)
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [11] [INFO] Shutting down: Master
2020/06/30 05:50:42 No new connections allowed. Exiting in: 10s
Context
I am using OpenFaaS for some long-running task with auto-scaling on Kubernetes. One day I found that there are a lot of Pods did not terminate correctly(status is Terminating), which means it was wasting resource.
Your Environment
- Docker version
docker version
(e.g. Docker 17.0.05 ):
Docker version 19.03.8, build afacb8b - Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Kubernetes - Operating System and version (e.g. Linux, Windows, MacOS):
MacOS 10.13.6 - Link to your project or a code example to reproduce issue: