Skip to content

Conversation

@YAMISHKA02
Copy link

Added one pannel on dashboard with Node health/unhealth status.
Its based on messages from node, produced last 5 minutes.
image
image

added node Health status based on messages in last 5 mins
@fryorcraken fryorcraken requested a review from a team October 4, 2024 03:16
@fryorcraken
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

@NagyZoltanPeter
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.

@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

@YAMISHKA02
Copy link
Author

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.

@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

Hello, the best way is of course to add something familiar with checkhlth.sh

Can you please send me link to file which is reference of metrics exporter? I can modify this file to add new metrics, exported by this.

@NagyZoltanPeter
Copy link
Contributor

@waku-org/nwaku would it be possible to have a prometheus entry that returns something similar to checkhlth.sh?

Yes, I think we can start metric server ahead of initialization just as rest service.
@YAMISHKA02 : Thank you for the initiative. I was thinking of this. While the fact that the node can relay messages is a superior indicator of healthy operation, we rather used to check mounted protocols and discovered node count. These can tell the node is up and ready to use. Relaying messages is heavily depends on actual network traffic which independent from the current node.

Hello, the best way is of course to add something familiar with checkhlth.sh

Can you please send me link to file which is reference of metrics exporter? I can modify this file to add new metrics, exported by this.

@YAMISHKA02 : Sorry for not answering yet. I'm afraid there is no single link I can point to as the health status of a node - if I'm thinking of a continuous report of it - consisting of several properties. We need to think of what is worth measuring. Currently chkhealth.sh is mainly to support node ops about the boot status of the node, because the very first boot with RLN sync can take a while and that was misunderstood in many ways. So of course there is plenty of room for improvement, I believe it will come into scope shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants