Health probe failures log useful message #746
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
Catches exceptions in 'HealthController.index/2' method & logs error and location in our code at level :error
Why?
So it can be understood why reticulum is unhealthy and point toward what needs to be fixed
Examples
old log message
Sentry is an error reporting tool which is not configured — this message is unrelated to the actual problem.
new log message
In this case, we know that some variable is
nilat health_controller.ex line 14, and something is trying to enumeratenilHow to test
mix test(or look at CI), observe that new automated test for reticulum being unhealthy passescurl -i curl http://ret:4001/health& observe that "ok" is still returnedkubectl scale deploy spoke --replicas=0 -n hcceto kill Spokekubectl scale deploy reticulum --replicas=0 -n hccethen runkubectl scale deploy reticulum --replicas=1 -n hcceto restart reticulum with empty cacheskubectl logs -l app=reticulum -f -n hcce; observe log message including "[error] Health check failed at health_controller.ex:16: %Protocol.UndefinedError{protocol: Enumerable, value: nil, description: ""}" showing that the Hubs tests pass, but the Spoke test fails.Documentation of functionality
No documentation change; this just produces better logs when reticulum is not healthy
Open questions
Could Cachex and RoomAssigner be mocked, so an automated test for reticulum being healthy could be written?
Additional details or related context
Written with the help of JetBrains' Junie LLM, but reworked by me.
The end result is similar to page_controller.ex lines 844–847.