-
Notifications
You must be signed in to change notification settings - Fork 15
Scaling the Waku Protocol: A Performance Analysis with Wakurtosis #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 7 commits
a1ce484
a186f46
2192a38
7988e6b
4cdad04
16d5971
5cee83c
244bcc1
a154cd8
2211f47
efec367
5525736
682a0db
502b0c2
5dbe5f5
652a4cb
e8767ce
bc339a1
2307712
6d2307d
a7bde1a
3c9a2ac
a8a1eb9
54f2642
c67ea09
918c279
903df64
0292766
fd2a1db
cd4167e
69aee97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| ## Introduction | ||
| The scalability and performance of the Waku protocol are subjects of critical importance to the VAC-DST team. | ||
| To explore these facets with high granulatiry across a wide range of scenarios we turned to Wakurtosis, a bespoke simulation framework developed internally. | ||
| By studying various network sizes, message rates, and peer discovery setups we aimed to better understand the protocol's capabilities and limitations, hence aspects that could benefit from further optimisation. | ||
|
|
||
| ## Understanding Wakurtosis | ||
| Wakurtosis is a robust simulation framework which integrates Docker and Kurtosis to create a simulation environment which allows to run highly granular, large scale simulations with a variety of traffic and network patterns. | ||
| At the core of Wakurtosis is Kurtosis — an orchestration tool responsible for managing containers, known as services, within isolated environments called enclaves. | ||
Daimakaimura marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| These enclaves house virtual networks and their respective containers. In addition to this, several external modules developed in-house expand some of Kurtosis's limited functionalities: | ||
| - Network Generation Module (Gennet): Initiates and configures networks for the simulation. It's highly modular, supporting the integration of multiple protocols and node traits. | ||
| - Packet Injection Module (WLS): Allows for the insertion of custom data packets, thereby enabling varied traffic patterns and stress tests on the simulations. | ||
| - Analysis Module: Captures and provides insights into resource usage, network behaviors and protocol interactions throught the enclave. | ||
|
|
||
| ### Data Collection | ||
| Wakurtosis ensures the accuracy of its data by leveraging multiple sources for hardware metrics: | ||
|
|
||
| ##### Cadvisor (a Google tool) | ||
| Cadvisor provides detailed metrics on resource usage and performance characteristics of Docker containers. | ||
| Cadvisor monitors application containers at the individual level by directly interfacing with Docker's daemon API. | ||
| While Cadvisor offers real-time metrics, it primarily focuses on container-specific metrics, which may neglect broader system-level insights. | ||
|
|
||
| ##### Docker statistics | ||
| Docker statistics provides insights into Docker's overall performance and resource allocation. | ||
| This native Docker tool captures statistics about running containers using Docker's stats API, collecting cumulative CPU, memory, network, and block I/O metrics. | ||
| Docker statistics offer a bird's-eye view of the system, which can sometimes miss the granularity of performance fluctuations inside individual containers, particularly when dealing with multiple processes per container. | ||
|
|
||
| ##### Process-level monitoring | ||
| Process-level monitoring offers detailed insights by tracking operational traits of processes within containers. | ||
| This method employs deep inspection of individual processes running inside a container by accessing */proc* kernel files to gather real-time process statistics. | ||
| Reading from the */proc* filesystem offers a direct window into kernel statistics, providing comprehensive metrics on each process within the containers. | ||
| However, while it offers granular insights, process-level monitoring can be more resource-intensive and might not always capture overarching system behavior. | ||
|
|
||
| ### Performance Metrics | ||
| Data sources like Cadvisor/Prometheus, Docker Stats, and process-level monitoring enhance the reliability of Wakurtosis, offering a comprehensive view of system performance. | ||
|
|
||
| - Hardware Level Metrics: Emphasis on memory usage, CPU consumption, Disk I/O, and Network I/O. | ||
| - Topology Level Metrics: Focuses on the efficiency of message propagation across the network, including metrics like message injection, propagation times, and message loss. | ||
|
|
||
| ### Scalability | ||
| To overcome scalability challenges, Wakurtosis employs a multi-node approach, running several nodes within one container. | ||
| This method supports simulations with over 1,000 nodes on a single machine. | ||
| However, this can introduce unforeseen network effects, potentially affecting some metrics. | ||
| For instance, running several nodes per container can alter propagation times, as nodes grouped within the same container may exhibit different messaging behavior compared to a true node-to-node topology | ||
| Additionally, employing a multi-node approach may result in losing node-level sampling granularity depending on the metrics infrastructure used, e.g. Cadvisor. | ||
| Nevertheless, Wakurtosis offers the flexibility to choose between this and a 1-to-1 simulation, catering to the specific needs of each test scenario. | ||
| The results presented in here are 1-to-1 simulations. | ||
|
||
|
|
||
| ## Examining the Waku Protocol | ||
|
|
||
Daimakaimura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### Simulation Setup | ||
| To evaluate Waku under varied conditions, we conducted simulations across a range of network sizes, topologies, and message rates. | ||
| Each simulation lasted 3 hours to rreach a steady state. | ||
Daimakaimura marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| The network sizes explored included *75*, *150*, *300*, and *600* nodes. | ||
| For non-Discv5 simulations, we used static topologies with average node degrees of *K=3*, *K=13*, and *K=50*. | ||
| In simulations with Discv5, we set the *max_peers* parameter to *12* and *50* to approximate similar average degrees. | ||
| To stress test message throughput, we simulated message rates of *1*, *10*, and *100* messages per second. | ||
| This combination of network sizes, topologies, message rates, and hardware configurations enabled us to comprehensively evaluate Waku's performance and scalability boundaries under diverse conditions. | ||
|
|
||
Daimakaimura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### Results | ||
| To provide a clear visualization of bandwidth usage in all the cases we studied, we've presented the results as multipliers over the pure payload for both reception (Rx) and transmission (Tx) in two heatmaps: | ||
|
|
||
|  | ||
|  | ||
|
|
||
| Despite certain scalability challenges, Wakurtosis has proved effective in simulating a Waku network with over *600* nodes on a single machine. | ||
| Our analysis under baseline conditions --- static network with average degrees of *K=3*, *K=13*, and *K=50* --- revealed good scalability and stability from a memory standpoint. | ||
| However, there was evidence of increased memory usage with higher message rates, indicating potential areas for optimization. | ||
| Bandwidth utilization exhibited minor fluctuations as the network size increased, and efficiency in bandwidth usage improved with an increase in the message rate. | ||
| Some inefficiencies were noted in larger networks, suggesting room for improvement but overall Waku displayed very stable behaviour in terms of memory and bandwidth usage. | ||
Daimakaimura marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| #### Impact of Discovery v5 | ||
| To shed light on the effects of Discovery v5, we focus on the denser configurations for both non-Discv5 and Discv5 with *K=50*. | ||
| The addition of Discv5 leads to higher memory consumption, especially as the network size increases. | ||
| This is likely due to the additional routing information that needs to be stored and maintained. | ||
| When examining the effects on bandwidth utilization, we see varying results. | ||
| For reception rates, with lower message rates we observe that Discv5 consistently utilizes more bandwidth than the corresponding non-Discv5 case. | ||
| However, those differences become less pronounced with higher message rates. | ||
| For transmission rates, both cases show similar performance and both exhibit huge improvements in transmission efficiency at higher message loads. | ||
|
||
|
|
||
| ## Conclusion | ||
| This study underscores the Waku protocol’s resilience and scalability across varied conditions, but also highlights the limitations of Wakurtosis and the need for a more robust simulation infrastructure for demanding scenarios. | ||
| The protocol’s robustness, evidenced by the absence of message loss, notable stability across network sizes and traffic loads, is a notable takeaway. | ||
| The addition of Discv5 generally leads to higher memory and bandwidth usage throughout the majority of scenarios. | ||
Daimakaimura marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Guided by these insights, our immediate priority is to keeping studying Waku behaviour for greater scalability and performance, especially under larger network loads, high-traffic situations, and different protocol configurations. | ||
| Stay updated with our progress! | ||
Uh oh!
There was an error while loading. Please reload this page.