-
Notifications
You must be signed in to change notification settings - Fork 3
chore: add Health Status to Waku API #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a health status specification to the Waku API, defining three health states (Unhealthy, MinimallyHealthy, Healthy) and an event source mechanism for health status changes, based on the implementation currently used in js-waku.
Key Changes:
- Addition of
HealthStatusenum type with three health levels - Introduction of event source mechanism for health status change notifications
- Extended documentation describing each health state
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
fryorcraken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do have to specify how the event source is accessible (e.g. events property on WakuNode).
|
|
||
| ```yml | ||
| types: | ||
| HealthStatus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things i have initially considered and while thinking again recently on this topic is to consider HealthStatus not just at node level but also at a more fine-grained level.
In case of relay health status makes more sense at shard level because a node can be healthily connected to 1 shard and not on the other.
When it comes to edge/light clients this becomes even more complicated as there are even finer grain concept of content-topics, i.e a node can have peers or subscriptions to certain content-topics and loose subscriptions to others.
If so, does it make more sense to consider health to be notified to a user based on content-topics? Ref discussion thread:
https://discord.com/channels/1110799176264056863/1414975481438011392/1428769536139857989
We can always aggregate the health of all shards or content-topics and depict it as node health, but i do think we will require granular health events to be reported to app devs. Consider status for example, if a user is part of 2 communities (which is nothing but 2 different content-topics) but has at some point only subscriptions to 1 content-topic then we can't indicate node is unhealthy rather reception on 1 community is unhealthy whereas on the other it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good idea, @chaitanyaprem
I think this comes to the realm of Waku App API as now we are communicating about application concerns of fragmented network and it's application of content topics in an app
but we definitely can implement it in Waku API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that health per content topic may make sense.
But at this point in time, I would KISS it, and then review whether an application developer would want to expose this information to their user. Aka, not do per content topic just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review whether an application developer would want to expose this information to their user
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
Please provide more context. What do they do with this information? How does the per content-topic information impacts application behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this
Good to know, If that's something they have/want to keep then yes, we should do it from the get go.
cc @osmaczko @jrainville to confirm if this feature still exists
cc @plopezlpz @jazzz FYI, there is a request to show connectivity at conversation level (can be done later in chat sdk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI
I remember they wanted to show connectivity health at community level
I only know about this feature from this ticket, but I don't know about our desire to implement this on Status UI level.
It makes sense on paper, but in reality this might unnecessarily complicate UI. And I don't think this is of any priority at the moment.
cc @jrainville
Backend
In backend we manually trigger discovery (or peer exchange) when we don't have enough peers.
- In full client, we run discovery periodically if we "have enough connected peers"
messaging/waku/gowaku.go#L1773-L1776 - In light client, we run peer exchange every 5 seconds
messaging/waku/gowaku.go#L605-L612
Instead, we want to:
- trigger discovery for specific pubsub topic, when it's unhealthy
- trigger peer exchange for specific content topic, when it's unhealhy
Co-authored-by: fryorcraken <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
standards/application/waku-api.md
Outdated
| `Unhealthy` indicates that the node has lost connectivity for message reception, | ||
| sending, or both, and as a result, it cannot reliably process or transmit messages. | ||
|
|
||
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: connect to at least one peer with a protocol to send messages ([LIGHT-PUSH]() or [RELAY]()) and one peer with a protocol to receive messages ([FILTER ] or [RELAY]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to use the proper names IMO (WAKU2-RELAY) and add the links please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using names as they were defined in originally in #65
do you want to change it across the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 Not sure how much it matters. @jimstir ?
The links are missing tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weboko It is good practice to use the actual title of the spec at the time of writing, e.g. 11/WAKU2-RELAY. In the waku/specs repo this is not required or enforced, but will be in the rfc-index.
|
Almost approved, still need to fix
|
chaitanyaprem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once comments are addressed, LGTM
|
|
||
| ```yml | ||
| types: | ||
| HealthStatus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review whether an application developer would want to expose this information to their user
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
@fryorcraken I'll merge this PR after #87 in which I define |
| fields: | ||
| eventType: | ||
| type: string | ||
| default: "health" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default values does not make sense here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivansete-status
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for it! 🙌
Adding my 2cs that I hope you find useful
Also, cc @fcecin as she will implement it on nwaku side 🥳
| HealthStatus: | ||
| type: enum | ||
| values: [Unhealthy, MinimallyHealthy, Healthy] | ||
| description: "Used to identify health of the operating node" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| HealthStatus: | |
| type: enum | |
| values: [Unhealthy, MinimallyHealthy, Healthy] | |
| description: "Used to identify health of the operating node" | |
| HealthState: | |
| type: enum | |
| values: [Unhealthy, Healthy] | |
| description: "Used to identify health of the operating node" | |
| HealthStateDetail: | |
| type: object | |
| fields: | |
| state: | |
| type: HealthState | |
| reason: | |
| type: string | |
| description: "Brings clear detail about why the node is in the given state." |
I think is very important/compulsory to allow precise detail about why a node is unhealthy, f.e. For now, it is enough returning this detail in a string so that the API consumer can better understand why the node is not getting healthy. That will also help us bring a better assistance (in case someone asks.)
The reason field will also give us freedom to properly inform about the topic health, as @chaitanyaprem suggested. Even, we can return a JSON in the future with better detail, which can help us to extend the API if needed.
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: | ||
| it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | ||
| one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | ||
| and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities, | ||
| although performance or reliability may still be impacted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use the MinimallyHealthy state. Instead, only Unhealthy and Healthy.
Much simpler and the API consumer doesn't need to bother about lightpush, etc.
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: | |
| it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | |
| one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | |
| and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities, | |
| although performance or reliability may still be impacted. |
This PR add spec for health status for Waku API and depicts what is used now in js-waku
This is re-use from the original attempt to define Messaging API
https://github.com/waku-org/specs/blob/17464d356a40cfa130c0df25b50769a4f38b7c45/standards/application/messaging-api.md#health-indicator
Resolves - logos-messaging/js-waku#2712