Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
a1ce484
Scaling the Waku Protocol: A Performance Analysis with Wakurtosis
Daimakaimura Aug 7, 2023
a186f46
Added images
Daimakaimura Aug 15, 2023
2192a38
Updated article with image links
Daimakaimura Aug 18, 2023
7988e6b
Added semantic line breaks
Daimakaimura Aug 21, 2023
4cdad04
Updated heatmaps with K=50 simulations
Daimakaimura Aug 22, 2023
16d5971
Re-written discussion and addressed reviewer comments
Daimakaimura Aug 22, 2023
5cee83c
Minor
Daimakaimura Aug 22, 2023
244bcc1
Added relevant links
Daimakaimura Sep 5, 2023
a154cd8
Added short intro to Waku
Daimakaimura Sep 5, 2023
2211f47
Edited Waku intro
Daimakaimura Sep 5, 2023
efec367
Expanded intro, links, memory usage expansion and other comments
Sep 7, 2023
5525736
Added efficiency plots
Daimakaimura Sep 15, 2023
682a0db
Added new efficiency calculatation and non-load results to discussion
Daimakaimura Sep 18, 2023
502b0c2
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Sep 25, 2023
5dbe5f5
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Sep 25, 2023
652a4cb
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Sep 25, 2023
e8767ce
Added redevous reference
Daimakaimura Sep 25, 2023
bc339a1
Waku relay as core component in intro
Daimakaimura Sep 25, 2023
2307712
Moving image folder into /img
Daimakaimura Sep 27, 2023
6d2307d
Added bandwidth plots
Daimakaimura Sep 27, 2023
a7bde1a
Fix naming
Daimakaimura Sep 27, 2023
3c9a2ac
Added memory usage plots
Daimakaimura Sep 27, 2023
a8a1eb9
Changes from review comments
Daimakaimura Sep 27, 2023
54f2642
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Oct 2, 2023
c67ea09
Added new plots and discussion with sub 1 msg/s simulations
Daimakaimura Oct 19, 2023
918c279
Merge branch 'DST-Wakurtosis' of https://github.com/vacp2p/vac.dev in…
Daimakaimura Oct 19, 2023
903df64
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Oct 23, 2023
0292766
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Oct 23, 2023
fd2a1db
Update rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Daimakaimura Oct 23, 2023
cd4167e
Updated bandiwdth plots with payloads
Daimakaimura Oct 23, 2023
69aee97
Fixed legend in bandwidth plots
Daimakaimura Oct 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions rlog/2023-08-04-wakurtosis-waku-scallability.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
—
layout: post
name: 'Scaling the Waku Protocol: A Performance Analysis with Wakurtosis'
title: 'Scaling the Waku Protocol: A Performance Analysis with Wakurtosis'
date: 2023-08-08 12:00:00
authors: Daimakaimura
published: true
slug: wakurtosis-waku-scallability-simulations
categories: waku, wakurtosis
—

# Scaling the Waku Protocol: A Performance Analysis with Wakurtosis

## Introduction

[Waku](https://waku.org/) is a family of P2P protocols enabling private, metadata-resistant messaging for Web3 by providing censorship resistance, adaptability, modular design, and a shared service network.
Waku is designed to enable communication between decentralized applications (dApps) in a peer-to-peer manner.
It serves as an improvement and successor to Ethereum's Whisper protocol, offering better scalability and efficiency.

[Waku Relay](https://rfc.vac.dev/spec/11/), on the other hand, is the core component within the broader Waku framework.
It is responsible for relaying messages between nodes in the Waku network, effectively serving as the message dissemination mechanism.
While Waku encompasses a variety of functionalities and improvements for decentralized messaging, Waku Relay specifically focuses on the message propagation aspect within this larger system.

Finally, [Discv5 (Discovery v5)](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md) is a peer-to-peer networking protocol designed to facilitate node discovery decentralized networks.
It serves as an upgrade to earlier discovery protocols and is designed to be modular and extensible, allowing it to support various types of decentralized systems beyond just Ethereum.
The protocol enables nodes to find each other, maintaining a distributed hash table.

The scalability and performance of the Waku protocol are of critical importance.
To explore these facets with high granularity across a wide range of scenarios, we turned to [Wakurtosis](https://github.com/vacp2p/wakurtosis), a bespoke simulation framework developed internally.
By studying various network sizes, message rates, and peer discovery setups, we aimed to better understand the protocol's capabilities and limitations, and hence aspects that could benefit from further optimization.

Unfortunately, Wakurtosis did not fulfill many of our initial goals. We will delve into the specifics of these shortcomings in a retrospective article coming shortly.

## Understanding Wakurtosis

Wakurtosis is a simulation framework which integrates [Docker](https://www.docker.com/) and [Kurtosis](https://www.kurtosis.com/) to create a simulation environment that allows highly granular, large-scale simulations with a variety of traffic and network patterns.
At the core of Wakurtosis is Kurtosis — an orchestration tool responsible for managing containers, known as services, within isolated environments called enclaves.
These enclaves house virtual networks and their respective containers.
In addition to this, several external modules developed in-house expand some of Kurtosis' limitations:

- Network Generation Module (Gennet): Initiates and configures networks for the simulation. It's highly modular, supporting the integration of multiple topologies, protocols, and node traits.

- Packet Injection Module (WLS): Allows for the insertion of custom data packets, thereby enabling varied traffic patterns and stress tests on the simulations.

- Analysis Module: Captures and provides insights into resource usage, network behaviors, and protocol interactions throughout the enclave.

### Data Collection

Wakurtosis ensures the accuracy of its data by leveraging multiple sources for hardware metrics:

- ##### Cadvisor (a Google tool)
[Cadvisor](https://github.com/google/cadvisor) provides detailed metrics on resource usage and performance characteristics of Docker containers. Cadvisor monitors application containers at the individual level by directly interfacing with Docker's daemon API.
While Cadvisor offers real-time metrics, it primarily focuses on container-specific metrics, which may neglect broader system-level insights.

- ##### Docker statistics
[Docker statistics](https://docs.docker.com/engine/reference/commandline/stats/) provides insights into Docker's overall performance and resource allocation.
This native Docker tool captures statistics about running containers using Docker's stats API, collecting cumulative CPU, memory, network, and block I/O metrics.
Docker statistics offer a bird's-eye view of the system, which can sometimes miss the granularity of performance fluctuations inside individual containers, particularly when dealing with multiple processes per container.

- ##### Process-level monitoring
Process-level monitoring offers detailed insights by tracking operational traits of processes within containers.
This method employs deep inspection of individual processes running inside a container by accessing */proc* kernel files to gather real-time process statistics.
Reading from the */proc* filesystem offers a direct window into kernel statistics, providing comprehensive metrics on each process within the containers.
However, while it offers granular insights, process-level monitoring can be more resource-intensive and might not always capture overarching system behavior.

### Performance Metrics

Each simulation lasted 3 hours to reach a steady state.

- Hardware Level Metrics: Emphasis on memory usage, CPU consumption, disk I/O, and network I/O.

- Topology Level Metrics: Focuses on the efficiency of message propagation across the network, including metrics like message injection, propagation times, and message loss.

### Scalability

To overcome scalability challenges, Wakurtosis employs a multi-node approach, running several nodes within one container.
This method supports simulations with over 1,000 nodes on a single machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest network size mentioned below is 600 nodes. Why not using the full capacity of Wakurtosis?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for that size we need to employ more than 1 node per container. We are only showing simulation results for 1 node per container. Mostly for clarity as this can skew the results and will be harder to compare with the rest of the batches.

However, this can introduce unforeseen network effects, potentially affecting some metrics. For instance, running several nodes per container can alter propagation times, as nodes grouped within the same container may exhibit different messaging behavior compared to a true node-to-node topology.
Additionally, employing a multi-node approach may result in losing node-level sampling granularity depending on the metrics infrastructure used, e.g. Cadvisor.

Nevertheless, Wakurtosis offers the flexibility to choose between this and a 1-to-1 simulation, catering to the specific needs of each test scenario. The results presented in this stidu are all 1-to-1 simulations — i.e., one node per container.

## Examining the Waku Protocol

### Simulation Setup

To evaluate Waku under varied conditions, we conducted simulations across a range of network sizes, topologies, and message rates. Each simulation lasted 3 hours to reach a steady state.

The network sizes explored included 75, 150, 300, and 600 nodes.
We run simulations with discovery — Discv5 — and without discovery mechanisms — static network —.

For Non-Discv5 simulations, we used static topologies with average node degrees of K=50.
In simulations with Discv5, we set the max_peers parameter to 50 to approximate similar average degrees.

To stress test message throughput, we simulated message rates of 1, and 10 messages per second.
We initally run simulations up to a 100 msg/s but we found out the results were unreliable due to simulation hardware limitations and therefore decided not to include them in this analysis.

We also included simulations batches with no load — i.e. 0 Msg/s. — to provide a clearer picture of Waku's baseline resource demands and inherent overhead costs stemming from core protocol operations.
This combination of network sizes, topologies, message rates, and hardware configurations enabled us to comprehensively evaluate Waku's performance and scalability boundaries under diverse conditions.

#### Clarification on Non-Discv5 Scenarios:

It's important to note that the Non-Discv5 scenarios presented in this study serve as a theoretical baseline for comparison and are not meant to represent real-world conditions.
In a live environment, some form of peer discovery mechanism would be necessary for the functioning of the network.
Mechanisms like ['rendezvous'](https://docs.libp2p.io/concepts/discovery-routing/rendezvous/) would also introduce additional bandwidth costs.

Therefore, the intention behind including Non-Discv5 simulations is not to compare their performance directly with Discv5 but rather to establish a fundamental baseline against which the added complexities and scaling costs of employing a discovery mechanism like Discv5 can be better understood.

### Simulation Results

We present the results of the simulations in two sets (without discovery and with discovery) of two separate plots showing total bandwidth usage and average peak memory usage across different network sizes.

The first plot depicts the total bandwidth usage with the x-axis representing the number of nodes with separate series for the different traffic loads.

The second plot shows the average peak memory usage with the x-axis also indicating the number of nodes and different traffic loads.

Despite numerous scalability and stability challenges, Wakurtosis has proved effective in simulating a Waku network up to 600 nodes on a single machine, however we encountered issues simulating higher message rates with those network sizes.

#### Without discovery mechanism (baseline)

The analysis reveals interesting trends in bandwidth and memory usage across different network sizes and loads.

Transmission bandwidth does not consistently increase with more nodes when messages are sent, while reception bandwidth exhibits even more variable behavior depending on scale and load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the message rate is independent of the number of nodes,
it is expected that the bandwidth consumption is close to constant, too.

The following statement goes against this expectation:

bandwidth does not consistently increase

since it indicates am expected increase in message rate.
Do you expect such an increase, and if yes, why?

Is there something more than bandwidth reducing because of overload of the testing hardware?
(Imo, this strang behaviour that only stems for artifacts of the testing environment and does not happen in a real scenario should not be part of the actual analysis and only mentioned as additional info in a subsquent paragraph; it might confuse readers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Daniel here that these kind of "pure observations" without interpretation can be confusing, especially if it's in the opening of the paragraph. As a reader, I'm interested in questions such as "does Waku bandwidth grow in any particular pattern with network growth, message rate increase?", "does this pattern tell me anything fundamental about the protocol?", "anything in the results that surprises us or deviate from what we would expect mathematically?". Any particular observations about simulation anomalies can be mentioned (with some interpretation) separately, but the paragraph should open and focus on interpretative questions like these.


The reduced bandwidth at high message rates stems from simulation infrastructure limits rather than protocol inefficiencies.
The no-load measurements provide insights into baseline protocol overhead costs that grow with network size.

Transmission overhead appears to increase linearly, while reception overhead accelerates sharply beyond 300 nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the explanation for this? Is this linked to the overloading hardware, too? If yes, we should explicitly mention that here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the transmission overhead increase linearly in the number of nodes or in the number of messages?
This should be explicit here.

Assuming you mean number of nodes, why does it increase linearly even though the message rates are fixed?
(Especially since this scenario is without discv5.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence here seems to me to be the important takeaway in this paragraph. "Bandwidth increases linearly with...". It should not be mixed in with an uninterpreted observation about simulation effects in reception bandwidth.


![Total average bandwidth usage without discovery mechanism](/static/img/wakurtosis_waku/Baseline_Bandwidth.png)

<figcaption>

***Total average bandwidth usage without discovery mechanism (baseline).***

</figcaption>

In addition to bandwidth, we also examined average peak memory usage under the different network sizes and messaging loads.

![Average peak memory usage without discovery mechanism](/static/img/wakurtosis_waku/Baseline_Memory.png)

<figcaption>

***Average peak memory usage without discovery mechanism (baseline).***

</figcaption>

With no load, memory usage remained consistent around 20-21 MB across all network scales.
Under 1 message per second, average memory usage increased slightly to 21-22 MB.
The highest memory usage of 23-25 MB was seen with 10 messages per second, especially at 150 nodes.

This aligns with expectations that higher messaging loads require more memory for message processing and routing.
However, the differences are relatively small, with only around a 20% increase from no load to 10 msgs/s.
This suggests the protocol has a fairly fixed memory overhead cost, with incremental increases as more messages are handled.
Overall, the memory usage appears stable and scales reasonably well, without any dramatic growth as network size expands.
This consistency indicates efficient memory management and low overhead from a protocol perspective.

The baseline simulations displayed efficient baseline overhead and scaling for memory, transmission, and reception as message rates grew or network size expanded.

#### With discovery mechanism (Discv5)

With the discovery mechanism, we again see interesting trends in resource usage across different network sizes and traffic loads.

Transmission bandwidth remainded fairly stable with more nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Does this tell us something about Waku messaging protocol, our chosen discovery technique and network size?

We again observe reduced bandwidth at the highest rates and node counts, stemming from simulation infrastructure constraints rather than inherent protocol limitations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd put this in a separate section, because this is confusing to someone quickly reading over the doc.
This is only a test specific artifact.


![Total average bandwidth usage with discovery mechanism](/static/img/wakurtosis_waku/Discv5_Bandwidth.png)

<figcaption>

***Total average bandwidth usage with discovery mechanism.***

</figcaption>

Reception bandwidth grows much faster with number of nodes compared to the baseline, especially the large spike from 300 to 600 nodes with discovery enabled.
This reflects the substantial additional overhead of neighbor discovery and tracking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because message rates / payload transmission is comparatively low?
Somebody reading this would expect that discovery traffic is lower.
A bit more explanation would be helpful here.


Similarly, memory usage also increases rapidly with number of nodes when discovery is enabled, aligning with the higher memory costs for node tracking and maintenance of discovery data structures.
Even with no load, the reception bandwidth and memory overheads of discovery are evident.

![Average peak memory usage with discovery mechanism](/static/img/wakurtosis_waku/Discv5_Memory.png)

<figcaption>

***Average peak memory usage with discovery mechanism .***

</figcaption>

Overall, the discovery mechanism adds significant reception bandwidth and memory overhead that both scale up more sharply with network size compared to the baseline. However, the transmission bandwidth impact appears relatively contained.

## Conclusion

This study underscores the Waku protocol’s resilience and scalability across varied conditions but also highlights the challenges and limitations of Wakurtosis and the need for a more robust simulation infrastructure for demanding scenarios.

The protocol’s robustness, evidenced by the absence of message loss, good stability across network sizes and traffic loads, is a notable takeaway.
As expected, simulations with Discv5 generally lead to higher resource usage throughout the majority of scenarios, with reception bandwidth seeing the largest overhead and poorest scaling &mdash; nearly doubling baseline costs and growing substantially with larger network sizes.
While the simulations were limited by infrastructure constraints at high node counts and rates, they strongly demonstrate Waku's capabilities within those bounds. Moving forward, enhancing the simulation infrastructure will enable more rigorous testing of extreme scenarios.

Guided by these insights, our immediate priority is to continue studying Waku behaviour focussing in greater scalability and performance, particularly under larger network, high-traffic situations, and different protocol configurations.

Stay updated with our progress!
Binary file added static/img/wakurtosis_waku/Baseline_Bandwidth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/wakurtosis_waku/Baseline_Memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/wakurtosis_waku/Discv5_Bandwidth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/wakurtosis_waku/Discv5_Efficiency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/wakurtosis_waku/Discv5_Memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/wakurtosis_waku/Dsicv5_Bandwidth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.