[Milestone] Waku Network Can Support 10K Users

**Priority Tracks**: Secure Scalability
**Due date:** 31 May 2023
Milestone: https://github.com/waku-org/pm/milestone/5

> _Note:_ this deadline assumes that the target of 1 Mio users by end-June 2023 could lean for the largest part on the designed solutions for the problem space defined below.

# Summary

- Scale to 10K Status Community users, spread across ~10 to ~100 communities
- This milestone focuses on 100% Desktop users, primarily using relay, but with experimental/beta support for filter and lightpush for clients with poor connectivity
- Communities, private group chats and 1:1 chats should be considered. Public chats are excluded.

# Tasks / Epics

- [x] 1. Verify scaling target requirements. Output: https://github.com/vacp2p/research/issues/177 Status to implement
- [x] 2. Community sharding plan https://github.com/vacp2p/research/issues/154 (note this issue also track 1mil work)
- [x] 3. Simple Waku Relay DoS Mitigation https://github.com/vacp2p/research/issues/164
- [x] 4. Scalable storage: nwaku archive PostgreSQL implementation https://github.com/waku-org/pm/issues/4 https://github.com/waku-org/nwaku/issues/1888
- [x] 5. Scalable storage: Deterministic Message ID https://github.com/vacp2p/rfc/issues/563 https://github.com/waku-org/pm/issues/9 (latter issue tracks work for distributed store, not part of 10k or 1mil milestones)
- [x] ~6. Scalable Storage: testing store at scale https://github.com/vacp2p/research/issues/191#issuecomment-1672542165~ Tracked as part of status-im/infra-nim-waku#31 
- [x] 7. Filter and lightpush improvements https://github.com/waku-org/pm/issues/5
- [x] 8. Peer management strategy https://github.com/waku-org/nwaku/issues/1353
- [x] 9. Combine into comprehensive scaling strategy https://github.com/vacp2p/research/issues/165
- [x] 10. Targeted dogfooding, fleet setup and related docs https://github.com/waku-org/pm/issues/61 https://github.com/waku-org/nwaku/issues/1914
- [x] 11. New multiaddrs discovery: libp2p rendezvous https://github.com/vacp2p/research/issues/176
- [x] 12. Waku static sharding implementation https://github.com/waku-org/pm/issues/15 (issue also tracks js-waku work with is not in scope for 10k milestone)
- [x] Service/Capability Discovery
- [x] 1k nodes simulation https://discord.com/channels/1110799176264056863/1135615797700608090/1135615800510795888
- [x] https://github.com/waku-org/nwaku/issues/611


---

## Extracted questions

- [x] Are the number of users and number of communities realistic? _Answer on 2023-01-19: yes, makes sense as an initial goal_
- [ ] What is the proportion (in message rate and bandwidth) of community messages vs community control messages vs store query-responses?
- [x] Does message rate increase linearly with increase in network size? Answer on 2023-01-19: generally should be the case (could have a multiplicative factor, but not combinatorial or exponential)_
- [x] What bandwidth upper bound should we target for Desktop nodes? One possible answer: ADSL2+ limit of 3.5 Mb/s?
- [x] Can this MVP consider participation in only one Community at a time? Answer on 2023-01-24: nodes will be part of multiple communities from the beginning.
- [ ] What store query rate should we target for 10K users?

## Network requirements

> _Note:_ this gathers the _minimal_ set of requirements the Waku network must adhere to to support Status Communities scaling to 10K users. It does not propose a design.

### 1. Message Delivery and Sharding

> _Note:_ this section, especially, depends on app-defined user experience minimals. E.g. the app knows what (sub)set of messages is necessary "for a consistent experience" and this will feed into a pubsub topic, content topic and sharding design that does not compromise on UX. This process should also define when messages should be received "live" (relay) or opportunistically via history queries (store).

1. Nodes should be able to receive (via relay or store) all community messages of the community they're part of.
2. Nodes should receive _live_ (via relay) all **chat** messages that is necessary for a consistent experience. A chat message is content sent by users either in a community channel, 1:1 or private group.
3. Nodes should receive _live_ (via relay) all **control** messages that is necessary for a consistent experience.  Control messages are mostly used for community reasons, with some for 1:1 and private groups (e.g. online presence and X3DH bundle).
4. Each community can utilize a single or multiple shards for control and community messages, as long as requirements (1) - (3) still hold.
5. Nodes should participate in shards in such a way that resource usage (especially bandwidth) is minimized, while requirements (1) - (3) still hold.
6. Peer and connection management should be sufficient to allow nodes to maintain a healthy set of connections within each shard they participate in.

**Assumptions:**
- connectivity, NAT traversal capability, NAT hole punching, etc. is similar to that described https://github.com/waku-org/pm/issues/7. No further work is required within the context of this MVP.
- it is possible to be part of several communities simultaneously
- we assume that community size is such that community desktop nodes can realistically be expected to relay the messages for all community traffic. That is - communities can be responsible for their own relay infrastructure.

### 2. Discovery

1. Nodes should be able to discover peers within each shard they're interested in.
2. Discovery method(s) can operate within a single or multiple shards, as long as:
  - requirement (1) still holds
  - nodes can bootstrap the chosen discovery method(s) for shards they're interested in
  - the chosen discovery method(s) does not add an unreasonable resource burden on nodes, especially if this method is shared between shards

**Assumptions:**
- nodes are able to use discv5 as their main discovery method

### 3. Bootstrapping

1. Nodes should be able to initiate connection to bootstrap nodes within the shards they're interested in.
2. Bootstrap nodes can serve a single or multiple shards, as long as they can handle the added resource burden.

**Assumptions:**
- Status initially provides bootstrapping infrastructure.
- DNS discovery is sufficient to find initial bootstrap nodes.

### 4. Store nodes (Waku Archive)

1. Nodes should be able to find capable store nodes and query history within the shards they're interested in.
2. Store nodes can serve a single or multiple shards, as long as:
  - they can handle the query rate and resource burden
  - are discoverable as stated in requirement (1)

**Assumptions:**
- Status provide initial store infrastructure, including a performant Waku Archive implementation.
- PostgreSQL implementations exist for Waku Archive that can deal with the required rate of parallel queries to support 10K users
- DNS discovery is sufficient to discover capable store nodes (these may or may not be the same nodes as used for bootstrapping, but discovery will be simpler if they are).
 
### 5. Security:

1. Community members should not be vulnerable to simple DoS/spam attacks as defined in (3) and (4) below.
2. Each community should be unaffected by failures and DoS/spam attacks in other communities. This implies some isolation/sharding in the messaging domain.
3. Store/Archive:
    - store nodes for a community should only archive messages actually originating from the community
    - store nodes for a community should not be vulnerable to being taken down by a high rate of history queries
4. Relay:
    - community relay nodes should only relay messages actually originating from the community.

**Assumptions:**
- Community members (i.e. the application) are able to validate messages against community membership. 

## Other requirements

> _Note:_ this gathers the _minimal_ set of requirements _outside_ the Waku network (e.g. operational, testing, etc.) to support Status Communities scaling to 10K users.

### 1. Kurtosis network testing

A simulation framework and initial set of tests that can approximate:
- the protocols
- the discovery methods
- the traffic rates for a typical community
in such a way to prove the viability of any scaling design proposed to achieve the Network Requirements

### 2. Community Protocol hardening

The Community Chat Protocols specifications are moved to Vac RFC repo.
- [ ] what else is required within this MVP time frame, e.g. including Community Chat in Kurtosis testing?

### 3. Nwaku integration testing

Nwaku requires integration testing and automated regression testing for releases to improve trust in stability of each release.

### 4. Fleet ownership

Ownership for infrastructure provided to Status communities should be established. This may require training and transfer of responsibilities which mostly lies de facto within the nwaku team.
Fleet ownership comprises the responsibility for:
- establishing a sensible upgrade process (may require some nodes for staging)
- upgrading fleets
- monitoring existing fleets and protocol behavior
- support and logging bugs when noticed

## Initial work

The requirements above will lead to a design and task breakdown. Roughly the order of work:

Ownership for all three items below is shared between Vac, Waku and Status teams:

(1) Agree on requirements above as the complete and minimal set to achieve the 10K scaling goal.
(2) A viable, KISS network design adhering to "Network requirements"
(3) Task breakdown of each item and ownership assignment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Milestone] Waku Network Can Support 10K Users #12

Summary

Tasks / Epics

Extracted questions

Network requirements

1. Message Delivery and Sharding

2. Discovery

3. Bootstrapping

4. Store nodes (Waku Archive)

5. Security:

Other requirements

1. Kurtosis network testing

2. Community Protocol hardening

3. Nwaku integration testing

4. Fleet ownership

Initial work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Milestone] Waku Network Can Support 10K Users #12

Description

Summary

Tasks / Epics

Extracted questions

Network requirements

1. Message Delivery and Sharding

2. Discovery

3. Bootstrapping

4. Store nodes (Waku Archive)

5. Security:

Other requirements

1. Kurtosis network testing

2. Community Protocol hardening

3. Nwaku integration testing

4. Fleet ownership

Initial work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions