Skip to content

Conversation

@weboko
Copy link
Contributor

@weboko weboko commented Oct 22, 2025

This PR add spec for health status for Waku API and depicts what is used now in js-waku

This is re-use from the original attempt to define Messaging API
https://github.com/waku-org/specs/blob/17464d356a40cfa130c0df25b50769a4f38b7c45/standards/application/messaging-api.md#health-indicator

Resolves - logos-messaging/js-waku#2712

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a health status specification to the Waku API, defining three health states (Unhealthy, MinimallyHealthy, Healthy) and an event source mechanism for health status changes, based on the implementation currently used in js-waku.

Key Changes:

  • Addition of HealthStatus enum type with three health levels
  • Introduction of event source mechanism for health status change notifications
  • Extended documentation describing each health state

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

weboko and others added 2 commits October 23, 2025 00:44
Copy link
Contributor

@fryorcraken fryorcraken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do have to specify how the event source is accessible (e.g. events property on WakuNode).

@fryorcraken fryorcraken self-assigned this Oct 23, 2025

```yml
types:
HealthStatus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things i have initially considered and while thinking again recently on this topic is to consider HealthStatus not just at node level but also at a more fine-grained level.
In case of relay health status makes more sense at shard level because a node can be healthily connected to 1 shard and not on the other.
When it comes to edge/light clients this becomes even more complicated as there are even finer grain concept of content-topics, i.e a node can have peers or subscriptions to certain content-topics and loose subscriptions to others.

If so, does it make more sense to consider health to be notified to a user based on content-topics? Ref discussion thread:
https://discord.com/channels/1110799176264056863/1414975481438011392/1428769536139857989

We can always aggregate the health of all shards or content-topics and depict it as node health, but i do think we will require granular health events to be reported to app devs. Consider status for example, if a user is part of 2 communities (which is nothing but 2 different content-topics) but has at some point only subscriptions to 1 content-topic then we can't indicate node is unhealthy rather reception on 1 community is unhealthy whereas on the other it is fine.

WDYT @fryorcraken @weboko @Ivansete-status
cc @igor-sirotin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good idea, @chaitanyaprem

I think this comes to the realm of Waku App API as now we are communicating about application concerns of fragmented network and it's application of content topics in an app

but we definitely can implement it in Waku API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that health per content topic may make sense.

But at this point in time, I would KISS it, and then review whether an application developer would want to expose this information to their user. Aka, not do per content topic just yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review whether an application developer would want to expose this information to their user

iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.

Copy link
Contributor

@fryorcraken fryorcraken Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.

Please provide more context. What do they do with this information? How does the per content-topic information impacts application behaviour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this

Good to know, If that's something they have/want to keep then yes, we should do it from the get go.
cc @osmaczko @jrainville to confirm if this feature still exists

cc @plopezlpz @jazzz FYI, there is a request to show connectivity at conversation level (can be done later in chat sdk)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI

I remember they wanted to show connectivity health at community level

I only know about this feature from this ticket, but I don't know about our desire to implement this on Status UI level.

It makes sense on paper, but in reality this might unnecessarily complicate UI. And I don't think this is of any priority at the moment.

cc @jrainville

Backend

In backend we manually trigger discovery (or peer exchange) when we don't have enough peers.

Instead, we want to:

  • trigger discovery for specific pubsub topic, when it's unhealthy
  • trigger peer exchange for specific content topic, when it's unhealhy

cc @weboko @chaitanyaprem @fryorcraken

Co-authored-by: fryorcraken <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

fryorcraken
fryorcraken previously approved these changes Oct 24, 2025
`Unhealthy` indicates that the node has lost connectivity for message reception,
sending, or both, and as a result, it cannot reliably process or transmit messages.

`MinimallyHealthy` indicates that the node meets the minimum operational requirements: connect to at least one peer with a protocol to send messages ([LIGHT-PUSH]() or [RELAY]()) and one peer with a protocol to receive messages ([FILTER ] or [RELAY])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use the proper names IMO (WAKU2-RELAY) and add the links please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using names as they were defined in originally in #65

do you want to change it across the file?

Copy link
Contributor

@fryorcraken fryorcraken Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 Not sure how much it matters. @jimstir ?

The links are missing tho

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weboko It is good practice to use the actual title of the spec at the time of writing, e.g. 11/WAKU2-RELAY. In the waku/specs repo this is not required or enforced, but will be in the rfc-index.

@fryorcraken fryorcraken dismissed their stale review October 24, 2025 00:08

one comment was not tackled

@fryorcraken
Copy link
Contributor

fryorcraken commented Oct 24, 2025

Almost approved, still need to fix

You do have to specify how the event source is accessible (e.g. events property on WakuNode).

Copy link
Contributor

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once comments are addressed, LGTM


```yml
types:
HealthStatus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review whether an application developer would want to expose this information to their user

iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.

@weboko
Copy link
Contributor Author

weboko commented Oct 28, 2025

You do have to specify how the event source is accessible (e.g. events property on WakuNode).

@fryorcraken I'll merge this PR after #87 in which I define events property

fields:
eventType:
type: string
default: "health"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default values does not make sense here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#94

Copy link
Contributor

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for it! 🙌
Adding my 2cs that I hope you find useful
Also, cc @fcecin as she will implement it on nwaku side 🥳

Comment on lines +324 to +327
HealthStatus:
type: enum
values: [Unhealthy, MinimallyHealthy, Healthy]
description: "Used to identify health of the operating node"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HealthStatus:
type: enum
values: [Unhealthy, MinimallyHealthy, Healthy]
description: "Used to identify health of the operating node"
HealthState:
type: enum
values: [Unhealthy, Healthy]
description: "Used to identify health of the operating node"
HealthStateDetail:
type: object
fields:
state:
type: HealthState
reason:
type: string
description: "Brings clear detail about why the node is in the given state."

I think is very important/compulsory to allow precise detail about why a node is unhealthy, f.e. For now, it is enough returning this detail in a string so that the API consumer can better understand why the node is not getting healthy. That will also help us bring a better assistance (in case someone asks.)

The reason field will also give us freedom to properly inform about the topic health, as @chaitanyaprem suggested. Even, we can return a JSON in the future with better detail, which can help us to extend the API if needed.

Comment on lines +335 to +339
`MinimallyHealthy` indicates that the node meets the minimum operational requirements:
it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)),
one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)),
and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities,
although performance or reliability may still be impacted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use the MinimallyHealthy state. Instead, only Unhealthy and Healthy.
Much simpler and the API consumer doesn't need to bother about lightpush, etc.

Suggested change
`MinimallyHealthy` indicates that the node meets the minimum operational requirements:
it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)),
one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)),
and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities,
although performance or reliability may still be impacted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants