-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
I'm noticing the following behavior in some cases when a redis failover has happened in the cluster.
The /metrics endpoint fails:
[2025-08-18 04:20:38,475] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 644, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/resp2.py", line 15, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
raw = self._buffer.readline()
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/socket.py", line 115, in readline
self._read_from_socket()
~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/socket.py", line 65, in _read_from_socket
data = self._sock.recv(socket_read_size)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 1511, in wsgi_app
response = self.full_dispatch_request()
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 919, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 917, in full_dispatch_request
rv = self.dispatch_request()
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 902, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/app/src/http_server.py", line 32, in metrics
current_app.config["metrics_puller"]()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/src/exporter.py", line 156, in scrape
self.track_queue_metrics()
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/src/exporter.py", line 238, in track_queue_metrics
for worker, stats in (self.app.control.inspect().stats() or {}).items()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 243, in stats
return self._request('stats')
~~~~~~~~~~~~~^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 106, in _request
return self._prepare(self.app.control.broadcast(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
command,
^^^^^^^^
...<6 lines>...
pattern=self.pattern, matcher=self.matcher,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
))
^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 777, in broadcast
return self.mailbox(conn)._broadcast(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
command, arguments, destination, reply, timeout,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
limit, callback, channel=channel,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/.venv/lib/python3.13/site-packages/kombu/pidbox.py", line 337, in _broadcast
self._publish(command, arguments, destination=destination,
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
reply_ticket=reply_ticket,
^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
pattern=pattern,
^^^^^^^^^^^^^^^^
matcher=matcher)
^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/pidbox.py", line 299, in _publish
maybe_declare(self.reply_queue(chan))
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/common.py", line 113, in maybe_declare
return _maybe_declare(entity, channel)
File "/app/.venv/lib/python3.13/site-packages/kombu/common.py", line 155, in _maybe_declare
entity.declare(channel=channel)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 617, in declare
self._create_queue(nowait=nowait, channel=channel)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 626, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 655, in queue_declare
ret = channel.queue_declare(
queue=self.name,
...<5 lines>...
nowait=nowait,
)
File "/app/.venv/lib/python3.13/site-packages/kombu/transport/virtual/base.py", line 538, in queue_declare
return queue_declare_ok_t(queue, self._size(queue), 0)
~~~~~~~~~~^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/transport/redis.py", line 1012, in _size
sizes = pipe.execute()
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1613, in execute
return conn.retry.call_with_retry(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
lambda: execute(conn, stack, raise_on_error),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lambda error: self._disconnect_raise_on_watching(conn, error),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 92, in call_with_retry
raise error
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 87, in call_with_retry
return do()
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1614, in <lambda>
lambda: execute(conn, stack, raise_on_error),
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1455, in _execute_transaction
connection.send_packed_command(all_cmds)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 581, in send_packed_command
self.check_health()
~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 573, in check_health
self.retry.call_with_retry(self._send_ping, self._ping_failed)
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 92, in call_with_retry
raise error
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 87, in call_with_retry
return do()
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 563, in _send_ping
if str_if_bytes(self.read_response()) != "PONG":
~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 652, in read_response
raise ConnectionError(f"Error while reading from {host_error} : {e.args}")
redis.exceptions.ConnectionError: Error while reading from <my-redis-service>:6379 : (113, 'No route to host')
While at the same time, the liveness probe on /health still returns a 200:
$ curl -s 127.0.0.1:9808/health
Connected to the broker redis://<my-redis-service>:6379//
Restarting the pod manually fixes the issue.
Could we change this behavior? Maybe the process should exit instead of returning only an ERROR log?
Metadata
Metadata
Assignees
Labels
No labels