Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensible log behavior when redis is unavailable #15466

Open
wants to merge 2 commits into
base: devel
Choose a base branch
from

Conversation

AlanCoding
Copy link
Member

@AlanCoding AlanCoding commented Aug 26, 2024

SUMMARY

This picks up on the "redis" part of a prior PR #12698 and goes further.

That prior PR was too unfocused, trying to solve both the redis and receptor problems. Backing up, why am I looking at this problem in the first place? Because it gets in my way (personally) trying to diagnose other problems. Because when I look at logs, those logs are swamped due to 2 main reasons:

  • Problems with install ordering mean that dispatcher and other stuff runs before receptor is fully running and available
  • When redis goes down (which we do intentionally sometimes) various services absolutely scream in the logs about this fact

This PR is only concerned with the 2nd bullet point.

So when you get the log file you want (finally, after wading through the rest of the SOS report), you find that 95% of that log is stuff you don't want. Even worse, the noise is all "Traceback:" entries... which isn't great when what you're looking for is a stack trace of that format.

With that segway, here's a demo of the log behavior after taking redis down:

tools_redis_1     | 1:M 26 Aug 2024 02:03:54.472 # Redis is now ready to exit, bye bye...
tools_awx_1       | 2024-08-26 02:03:54,475 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Connection closed by server.
tools_awx_1       | 2024-08-26 02:03:54,474 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Connection closed by server.
tools_awx_1       | 2024-08-26 02:03:54,474 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Connection closed by server.
tools_awx_1       | 2024-08-26 02:03:54,474 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Connection closed by server.
tools_redis_1 exited with code 0
tools_awx_1       | 2024-08-26 02:03:57,650 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:03:59,498 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:03:59,498 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:03:59,498 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:03:59,502 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:02,707 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:02,894 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:02 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=0 qsize=0 rss=152.594MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=0 qsize=0 rss=152.621MB [IDLE]
tools_awx_1       | .  worker[pid:3486] sent=0 qsize=0 rss=152.625MB [IDLE]
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:04,524 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:04,523 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:04,530 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:04,534 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:07,762 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:08,819 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:08 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=0 qsize=0 rss=152.594MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=0 qsize=0 rss=152.621MB [IDLE]
tools_awx_1       | .  worker[pid:3486] sent=0 qsize=0 rss=152.625MB [IDLE]
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:08,869 WARNING  [-] awx.main.scheduler Redis connection error saving metrics for task_manager, error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:09,546 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:09,550 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:09,550 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:09,558 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:09,853 WARNING  [-] awx.main.scheduler Redis connection error saving metrics for dependency_manager, error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:12,811 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,569 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,572 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,572 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,583 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,818 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:14 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=1 finished=1 qsize=0 rss=159.254MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=0 qsize=0 rss=152.621MB [IDLE]
tools_awx_1       | .  worker[pid:3486] sent=1 qsize=1 rss=159.078MB
tools_awx_1       |      - running cf27e9ed-2fd6-4a32-b675-ae1403e1ce0e dependency_manager(*[])
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:14,823 WARNING  [-] awx.main.analytics Connection error in send_metrics: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:14,823 WARNING  [-] awx.main.analytics Connection error in send_metrics: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:17,856 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:19,593 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:19,595 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:19,595 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:19,609 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:22,823 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:22 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=1 finished=1 qsize=0 rss=159.254MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=1 qsize=1 rss=158.738MB
tools_awx_1       |      - running 4e5206e4-31f0-48ad-92fb-4957ed1f5d54 awx_periodic_scheduler(*[])
tools_awx_1       | .  worker[pid:3486] sent=2 finished=1 qsize=1 rss=159.078MB
tools_awx_1       |      - running 8d600d28-831e-4306-8b63-ffb33e491cba send_subsystem_metrics(*[])
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:22,833 WARNING  [-] awx.main.dispatch Redis connection error saving dispatcher metrics, error:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:22,910 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:24,620 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:24,620 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:24,620 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:24,630 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:27,939 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:28,820 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:28 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=1 finished=1 qsize=0 rss=159.254MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=1 qsize=1 rss=158.738MB
tools_awx_1       |      - running 4e5206e4-31f0-48ad-92fb-4957ed1f5d54 awx_periodic_scheduler(*[])
tools_awx_1       | .  worker[pid:3486] sent=2 finished=1 qsize=1 rss=159.078MB
tools_awx_1       |      - running 8d600d28-831e-4306-8b63-ffb33e491cba send_subsystem_metrics(*[])
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:28,866 WARNING  [-] awx.main.scheduler Redis connection error saving metrics for task_manager, error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:29,650 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:29,655 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:29,653 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:29,651 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:29,836 WARNING  [-] awx.main.scheduler Redis connection error saving metrics for dependency_manager, error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:32,995 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,670 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,673 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,678 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,679 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,823 WARNING  [-] awx.main.dispatch Redis connection error saving status data:
tools_awx_1       | Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | missed data:
tools_awx_1       | Recorded at: 2024-08-26 02:04:34 UTC 
tools_awx_1       | awx-1[pid:3477] workers total=4 min=4 max=631 
tools_awx_1       | .  worker[pid:3484] sent=1 finished=1 qsize=0 rss=159.254MB [IDLE]
tools_awx_1       | .  worker[pid:3485] sent=1 finished=1 qsize=0 rss=158.738MB [IDLE]
tools_awx_1       | .  worker[pid:3486] sent=4 finished=3 qsize=1 rss=159.488MB
tools_awx_1       |      - running c5ec804f-f846-4226-aa48-84104e56d129 dependency_manager(*[])
tools_awx_1       | .  worker[pid:3487] sent=0 qsize=0 rss=152.633MB [IDLE]
tools_awx_1       | 2024-08-26 02:04:34,876 WARNING  [-] awx.main.analytics Connection error in send_metrics: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:34,877 WARNING  [-] awx.main.analytics Connection error in send_metrics: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:38,047 WARNING  [-] awx.analytics.broadcast_websocket Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:39,703 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:39,702 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:39,703 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:39,711 ERROR    [-] awx.main.commands.run_callback_receiver redis connection error: Error 2 connecting to unix socket: /var/run/redis/redis.sock. No such file or directory.
tools_awx_1       | 2024-08-26 02:04:45,860 WARNING  [-] awx.main.tasks.system Heartbeat skew - interval=62.9535, expected=60

This is noisy on a certain point, but that is actually an interesting point.

The dispatcher "statistics" are used for the --status command. So if we can't get the statistics to stash into redis... what do we do? Before, we would drop that data on the floor and then log a giant stack trace. But, since redis connection errors are a very well-known quality, better to show the details of the error, and then print the data that we're dropping. Right? There's a non-zero chance that we have a pool management bug while at the same time hitting this, confounding debugging even more.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@AlanCoding
Copy link
Member Author

started YOLO 10231

@AlanCoding
Copy link
Member Author

From running a deployment, it looks like there's another loop not yet solved.

2024-08-28 15:00:04,792 INFO     [-] awx.main.dispatch Running worker dispatcher listening to queues ['tower_broadcast_all', 'tower_settings_change', 'automationcontroller-0']
2024-08-28 15:00:04,833 ERROR    [-] awx.main.dispatch Encountered unhandled error in dispatcher main loop
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/worker/base.py", line 244, in run
    self.worker.on_start()
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/dispatch/worker/task.py", line 141, in on_start
    dispatch_startup()
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/system.py", line 117, in dispatch_startup
    cluster_node_heartbeat()
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/system.py", line 575, in cluster_node_heartbeat
    inspect_execution_and_hop_nodes(instance_list)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/system.py", line 514, in inspect_execution_and_hop_nodes
    mesh_status = ctl.simple_command('status')
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/receptorctl/socket_interface.py", line 81, in simple_command
    self.connect()
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/receptorctl/socket_interface.py", line 99, in connect
    raise ValueError(f"Socket path does not exist: {path}")
ValueError: Socket path does not exist: /var/run/receptor/receptor.sock

I don't really want this either, so it'll probably take another commit. Let's see... it could be due to starting the dispatcher without redis available.

@AlanCoding
Copy link
Member Author

Wait, never mind about my last error. That's receptor, not redis. Not in scope here.

@AlanCoding AlanCoding marked this pull request as ready for review August 28, 2024 20:19
@AlanCoding
Copy link
Member Author

YOLO 10245

Copy link

sonarcloud bot commented Sep 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants