Skip to content

Support issue with Kubernetes #101

Open
@conan-o-chang

Description

Expected Behaviour

When we send a SIGTERM to fwatchdog, it should be closed after fprocess was gracefully shut down.

Current Behaviour

When we send a SIGTERM to fwatchdog, it was still running after fprocess was gracefully shut down(exit code of fprocess is 0).
In other case, if fprocess close with an error(exit code is not 0), watchdog will show a log Forked function has terminated: ... and kill itself.

Possible Solution

Exit fwatchdog itself when fprocess exited. It should relate to https://github.com/openfaas-incubator/of-watchdog/blob/c796e1b714d703c90bf6e3f392471746e0aeab2d/executor/http_runner.go#L70-L76
A idea is exit fwatchdog whatever fprocess died with error or not, but I am not sure is there any side effect about this change.

Steps to Reproduce (for bugs)

I am using template python3-flask, and fprocess is gunicorn

  1. Pull faas template: faas template store pull python3-flask
  2. Generate a new project name python3-flask-gracefully-shutdown: faas new python3-flask-gracefully-shutdown --lang python3-flask
  3. Add gunicorn to requirements.txt: echo 'gunicorn' > python3-flask-gracefully-shutdown/requirements.txt
  4. Build images locally: faas build -f python3-flask-gracefully-shutdown.yml
  5. Run container: docker run --rm -ti -e fprocess='gunicorn index:app' -p 8080:8080 --name python3-flask-gracefully-shutdown python3-flask-gracefully-shutdown
  6. Send SIGTERM to the container: docker kill --signal=SIGTERM python3-flask-gracefully-shutdown
  7. The log show that watchdog wait for 10 seconds(default timeout limit) to exit itself after it get SIGTERM, but it should be exited immediately.
Forking - gunicorn [index:app]
2020/06/30 05:50:27 Started logging stderr from function.
2020/06/30 05:50:27 Started logging stdout from function.
2020/06/30 05:50:27 OperationalMode: http
2020/06/30 05:50:27 Timeouts: read: 10s, write: 10s hard: 10s.
2020/06/30 05:50:27 Listening on port: 8080
2020/06/30 05:50:27 Writing lock-file to: /tmp/.lock
2020/06/30 05:50:27 Metrics listening on port: 8081
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Starting gunicorn 20.0.4
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Listening at: http://127.0.0.1:8000 (11)
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [11] [INFO] Using worker: sync
2020/06/30 05:50:27 stderr: [2020-06-30 05:50:27 +0000] [16] [INFO] Booting worker with pid: 16
2020/06/30 05:50:32 SIGTERM received.. shutting down server in 10s
2020/06/30 05:50:32 Removing lock-file : /tmp/.lock
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [11] [INFO] Handling signal: term
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [16] [INFO] Worker exiting (pid: 16)
2020/06/30 05:50:32 stderr: [2020-06-30 05:50:32 +0000] [11] [INFO] Shutting down: Master
2020/06/30 05:50:42 No new connections allowed. Exiting in: 10s

Context

I am using OpenFaaS for some long-running task with auto-scaling on Kubernetes. One day I found that there are a lot of Pods did not terminate correctly(status is Terminating), which means it was wasting resource.

Your Environment

  • Docker version docker version (e.g. Docker 17.0.05 ):
    Docker version 19.03.8, build afacb8b
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Kubernetes
  • Operating System and version (e.g. Linux, Windows, MacOS):
    MacOS 10.13.6
  • Link to your project or a code example to reproduce issue:

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions