Welcome to PDSP-Bench - Code and Documentation

PDSP-Bench, a novel benchmarking system specifically designed for benchmarking parallel and distributed stream processing on heterogeneous hardware configurations.

Citation

Please cite our papers, if you find this work useful or use it in your paper as a baseline.

@unpublished{Agnihotri_VLDB_PDSPBench_2024,
author      =     {Agnihotri, Pratyush and Koldehofe, Boris and Heinrich, Roman and Binnig, Carsten and Luthra, Manisha},
title       =     {PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing},
year        =     {2024},
pages       =     {1--22},
keywords    =     {C2},
note        =     {Accepted at TPCTC}
}

@inproceedings{Agnihotri_SIGMOD_Demo_PDSPBench_2025,
author      =     {Agnihotri, Pratyush and Binnig, Carsten},
title       =     {Demonstrating PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing},
year        =     {2025},
booktitle   =     {Companion of the 2025 International Conference on Management of Data},
pages       =     {7–10},
numpages    =     {4},
}

Dedicated Repository for Paper Submission:

This repository is created to support our paper submission titled PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing, showcasing the capabilities of PDSP-Bench.

Exploring PDSP-Bench's Key Components:

pdsp-bench_Cloud_setup: The main instructions to setup CloudLab environment for benchmarking using PDSP-Bench. It provides scripts to install dependecies and setup CloudLab resources for performance benchmarking.
pdsp-bench_controller: Controller is the backend of PDSP-Bench benchmarking system. It offers API endpoints to communicate with pdsp-bench_wui and automate various tasks like creating the cluster, providing jobs to Flink, saving the cluster and user information on sqlite DB.
pdsp-bench_workload_generator: Apache flink client application which functions as an essential tool for generating workload and parallel query plans related to synthetic and real-world benchmark applicaitons. These plans are vital for the benchmarking performance and performance forecasting using training and testing of learned cost models.
pdsp-bench_wui: Web user interface of PDSP-Bench system to enable communication with controller take user input for resource provisioning on CloudLab, workload generation and performance analysis and visualization.
pdsp-bench_ml_models: Main component of ML manager consisting of various learned component of DSP such as cost models to provide accurate inference cost predictions for various workload and resource configurations in parallel and distributed processing environment.
pdsp-bench_experiment_data Selected set of sample data of real-world and synthetic application collected for parallel query plans during performance benchmarking.

PDSP-Bench Description

PDSP-Bench allows to benchmarkk Distributed Stream Processing (DSP) systems. It offers to deploy choice of DSP systems such as SUT e.g., Apache Flink which can be benchmarked using 14 real-world and 9 synthetic applications. By interacting to the web user interface (pdsp-bench_wui:), you can execute different paralle query plans (PQP) related to these real-world and synthetic applications under varying workload such as data stream and query parameters and resource configurations. After query execution, you can visualize the performance of each query in real-time such as end to end latency, throughput, resource utilization or visualize historical performance data of different query execution to compare performances. In addition, PDSP-Bench offers to collect these data to be used by for training learned component of DSP systems. pdsp-bench_ml_models: offers performance or cost prediction using different learned cost models which were training on data collected from PDSP-Bench.

Overview of Step-wise Operations in PDSP-Bench

Create CloudLab account and setup cluster in CloudLab using pdsp-bench_Cloud_setup.
Make sure clone PDSPBench in your home folder or Create a folder PDSPBench in your home folder directory unzip downloaded files and copy subfoloder in PDSPBench e.g., ~\PDSPBench\dsp_be
Setup and Start pdsp-bench_controller
Setup and Start pdsp-bench_wui
In frontend, go to Explore Node tab put hostname from all CloudLab nodes in the frontend.
Go to Create Cluster tab and Create distributed environment by creating and deploying Apache Flink on cluster nodes. You can decided how many node you need for Flink while creating clusters.
After successful deployment of Flink and running cluster, you can select cluster and execute query specific real-world or synthetic applications by defing query specific parameters like, event rate, parallelism, execution time, number of iteration, as well as enumeration strategy.
During the execution, you can visualize the real-time performance or
Wait for the job to run for the specified duration then anaylze and visualize the historical performance metrics of different query from same or different applications.
Query execution specific configurations and metrics is collected automatically and dowloaded as JSON files and graph representation after the job has run for the specified time. These details are also stored in MongoDB and SQLite database.
These benchmark data can be collected to be used to train performance cost mode. We provide different trained learned cost models in pdsp-bench_ml_models: for performance prediction.
pdsp-bench_ml_models: can be used to predict performance and compare the performance accuracy of different learned cost models in Learned Model tab.

Note: Detailed steps about each component of PDSP-Bench is provided in their respective README.md files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to PDSP-Bench - Code and Documentation

Citation

Dedicated Repository for Paper Submission:

Exploring PDSP-Bench's Key Components:

PDSP-Bench Description

Overview of Step-wise Operations in PDSP-Bench

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
pdsp-bench_Cloud_setup		pdsp-bench_Cloud_setup
pdsp-bench_controller		pdsp-bench_controller
pdsp-bench_ml_models		pdsp-bench_ml_models
pdsp-bench_workload_generator		pdsp-bench_workload_generator
pdsp-bench_wui		pdsp-bench_wui
reference_images		reference_images
LICENSE		LICENSE
README.md		README.md

License

pratyushagnihotri/PDSPBench

Folders and files

Latest commit

History

Repository files navigation

Welcome to PDSP-Bench - Code and Documentation

Citation

Dedicated Repository for Paper Submission:

Exploring PDSP-Bench's Key Components:

PDSP-Bench Description

Overview of Step-wise Operations in PDSP-Bench

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages