Description
We're observing intermittent HTTP 502s in production, which seem to be correlated with the "autorestarting worker after current request" log line, and are less frequent as we increase max_requests
. I've reproduced this on 21.2.0 and 20.1.0, but it doesn't seem to happen in 20.0.4.
I've produced a minimal reproduction case following the gunicorn.org example as closely as possible, but please let me know if there are other changes you'd recommend:
Application
def app(environ, start_response):
data = b"Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
return iter([data])
Configuration
gunicorn --max-requests 10 --worker-class gthread myapp:app
Reproduction
#!/usr/bin/env bash
set -euo pipefail
# start gunicorn
gunicorn --max-requests 10 --log-level debug --worker-class gthread myapp:app &
# wait for gunicorn to be ready
sleep 5
# send 20 sequential HTTP requests with minimal output
for i in {1..20}; do
curl -S -s -o /dev/null -w "[REPRO]: Received HTTP %{http_code} from GET /${i}\n" "127.0.0.1:8000/${i}" || true
done
Quickstart
For convenience, I've packaged this into a single command that consistently reproduces the problem on my machine. If you have Docker installed, this should Just Work™️.
cat <<DOCKER_EOF | docker build -t gunicorn-repro-max-requests-rst - && docker run --rm gunicorn-repro-max-requests-rst
FROM python:3.11
RUN pip install --disable-pip-version-check --no-cache gunicorn
RUN cat <<EOF > myapp.py
def app(environ, start_response):
data = b"Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
return iter([data])
EOF
RUN cat <<EOF > repro.sh
#!/usr/bin/env bash
set -euo pipefail
# start gunicorn
gunicorn --max-requests 10 --log-level debug --worker-class gthread myapp:app &
# wait for gunicorn to be ready
sleep 5
# send 20 sequential HTTP requests with minimal output
for i in {1..20}; do
curl -S -s -o /dev/null -w "[REPRO]: Received HTTP %{http_code} from GET /\\\${i}\n" "127.0.0.1:8000/\\\${i}" || true
done
EOF
RUN chmod +x repro.sh
CMD ["./repro.sh"]
DOCKER_EOF
Example
Logs
[2023-07-21 00:20:10 +0000] [6] [DEBUG] Current configuration:
config: ./gunicorn.conf.py
wsgi_app: None
bind: ['127.0.0.1:8000']
backlog: 2048
workers: 1
worker_class: gthread
threads: 1
worker_connections: 1000
max_requests: 10
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
print_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: None
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: debug
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
logconfig_json: None
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: myapp:app
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0xffff935672e0>
on_reload: <function OnReload.on_reload at 0xffff93567420>
when_ready: <function WhenReady.when_ready at 0xffff93567560>
pre_fork: <function Prefork.pre_fork at 0xffff935676a0>
post_fork: <function Postfork.post_fork at 0xffff935677e0>
post_worker_init: <function PostWorkerInit.post_worker_init at 0xffff93567920>
worker_int: <function WorkerInt.worker_int at 0xffff93567a60>
worker_abort: <function WorkerAbort.worker_abort at 0xffff93567ba0>
pre_exec: <function PreExec.pre_exec at 0xffff93567ce0>
pre_request: <function PreRequest.pre_request at 0xffff93567e20>
post_request: <function PostRequest.post_request at 0xffff93567ec0>
child_exit: <function ChildExit.child_exit at 0xffff92db8040>
worker_exit: <function WorkerExit.worker_exit at 0xffff92db8180>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0xffff92db82c0>
on_exit: <function OnExit.on_exit at 0xffff92db8400>
ssl_context: <function NewSSLContext.ssl_context at 0xffff92db85e0>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2023-07-21 00:20:10 +0000] [6] [INFO] Starting gunicorn 21.2.0
[2023-07-21 00:20:10 +0000] [6] [DEBUG] Arbiter booted
[2023-07-21 00:20:10 +0000] [6] [INFO] Listening at: http://127.0.0.1:8000 (6)
[2023-07-21 00:20:10 +0000] [6] [INFO] Using worker: gthread
[2023-07-21 00:20:10 +0000] [8] [INFO] Booting worker with pid: 8
[2023-07-21 00:20:10 +0000] [6] [DEBUG] 1 workers
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /1
[REPRO]: Received HTTP 200 from GET /1
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /2
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /2
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /3
[REPRO]: Received HTTP 200 from GET /3
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /4
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /4
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /5
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /5
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /6
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /6
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /7
[REPRO]: Received HTTP 200 from GET /7
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /8
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /8
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /9
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /9
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /10
[2023-07-21 00:20:15 +0000] [8] [INFO] Autorestarting worker after current request.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /10
curl: (56) Recv failure: Connection reset by peer
[2023-07-21 00:20:15 +0000] [8] [INFO] Worker exiting (pid: 8)
[REPRO]: Received HTTP 000 from GET /11
[2023-07-21 00:20:15 +0000] [22] [INFO] Booting worker with pid: 22
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /12
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /12
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /13
[REPRO]: Received HTTP 200 from GET /13
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /14
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /14
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /15
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /15
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /16
[REPRO]: Received HTTP 200 from GET /16
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /17
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /17
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /18
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /18
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /19
[REPRO]: Received HTTP 200 from GET /19
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /20
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /20
Expected
I'd expect to receive an HTTP 200 for each request, regardless of the max-requests restart. We should see [DEBUG] GET /11
when the worker handles the 11th request.
Actual
The reproduction script sends GET /11
, but the worker never sees it, and we see a connection reset instead. The repro script reports a status code of 000
, but that's just a quirk of libcurl. I've used tcpdump and can confirm the RST
.
In case it's useful, I've also seen curl: (52) Empty reply from server
, but it happens less frequently and I'm not 100% sure that it's the same problem.
Workaround
Increasing max-requests makes this happen less frequently, but the only way to resolve it is to disable max-requests (or maybe switch to a different worker type?). Increasing the number of workers or threads doesn't seem to resolve the problem either, from what I've seen.