This is a project to benchmark different implementations of queues (currently FIFO, LIFO, bounded or unbounded) to measure their output and performance.
This project is currently undergoing a rework. The basic functionality still works but some features don't work as described in this README. Some arguments have been reworked to support other data structures, so the description below is not currently valid. Use the --help command to get the up-to-date arguments. Some features don't work for certain data structures, for example, benchmark_core/memeory_tracking doesn't work for the priority_queue data structures. The output will also look slightly different, but this is planned to be fixed in the future.
# Using cargo run
cargo run --release -p <Queue type> --features <Optional features>-- <General optional flags> <Benchmark type> <Optional flags for benchmark type>
# Alternatively
cargo build --release -p <Queue type> --features <Optional features>
./target/release/<Queue type> <General optional flags> <Benchmark type> <Optional flags for benchmark type>To run for a basic lock-based queue:
# Basic queue, benchmark measures throughput
cargo run -p basic_queue --release -- prod-con
# Basic queue, benchmark measures throughput and memory allocation
cargo run --p basic_queue --features benchmark_core/memory_tracking --release -- prod-conThis will compile and run the benchmarking framework. It will run the prod-con benchmark on the basic_queue implementation and produce a csv file in the ./output with results from the benchmark, as well as a file with a name starting with mem containing information about total memory allocated during the running.
There are several useful scripts located inside the scripts folder, as well as a README which describes how to use them.
Implemented Rust queues are:
array_queue- A queue from the cratecrossbeam.atomic_queue- A queue from the crateatomic-queue.basic_queue- AVecDequewrapped in a mutex. Implementation.bounded_ringbuffer- A simple custom ringbuffer wrapped in a mutex. Implementationbounded_concurrent_queue- A bounded queue from the crateconcurrent-queue.unbounded_concurrent_queue- An unbounded queue from the crateconcurrent-queue.lf_queue- An unbounded lock-free queue from the cratelf-queuelockfree_queue- A (supposedly) lock-free unbounded queue from the cratelockfree.ms_queue- An unbounded lock-free queue based on the Michael & Scott queue. Implementation.scc_queue- An unbounded lock-free queue from the cratescc.scc2_queue- An unbounded lock-free queue from the cratescc2.seg_queue- An unbounded queue from the cratecrossbeam.wf_queue- A bounded lock-free queue from the cratewfqueue. Patched here to be able to be compiled.lcrq- Our Rust implementation of the scalable lock-free unbounded LCRQ. Implementation. Requires x86-64.lprq- Our Rust implementation of the scalable lock-free unbounded LPRQ (a portable extension of the LCRQ). Implementation.faaa_queue- Our Rust implementation of the scalable lock-free unbounded FAAArrayQueue. Implementation.tz_queue_hp- A lock-free bounded queue based on this paper. This implementation uses hazard pointers for memory reclamation. Implementation.tz_queue_leak- A lock-free bounded queue based on this paper. This implementation has no memory reclamation scheme. Implementation.bbq- A Block Based Bounded Queue based on this paper from the cratebbq-rs. This queue implements a blocking mechanism and thus does not work forenq-deq. We fixed this issue and patched it here. The patched version is the one implemented in the framework. Requires nightly.
Additionally, there are some C++ queues, which are included with C bindings to pre-existing implementations:
boost_queue_cpp- A bounded lock-free C++ queue from theboostC++ library. Requiresboostto be installed on your system.moodycamel_cpp- A fast but non-linearizable (only maintains per-thread order) C++ queue (GitHub).lcrq_cpp- A C++ implementation of the LCRQ (Github). Requires x86-64.lprq_cpp- The LPRQ C++ implementation (Zenodo).faaa_queue_cpp- The FAAArrayQueue C++ implementation.
There are also the following Rust stacks:
lockfree_stack- A lock-free unbounded stack from the cratelockfree.scc_stack- An unbounded lock-free stack from the cratescc.scc2_stack- An unbounded lock-free stack from the cratescc2.
You have to choose which type of benchmark you want to run for your queue. They have sub-commands specific to themselves. Use the --help flag to print a help text about the sub-commands.
prod-con- Measures throughput and fairness. Threads are either producers or consumers. You can choose the amount of producers and consumers using their respective flags.enq-deq- Measures throughput and fairness. Threads alternate between enqueueing and dequeueing randomly. You can choose the spread of enqueuers/dequeuers using the--spreadflag. Using the--thread-countflag you can decide how many threads you want to use for the benchmark.bfs- Measures execution time. Performs a parallell breadth-first search on a graph of your choosing. After the execution, the benchmark will also do a sequential search to verify the parallel solution. The verification can be turned off by passing the--no-verifyflag. Choose graph file by passing the--graph-fileflag and specifying the path. The benchmark supports.mtxfiles. You can run several iterations of BFS by passing the-iflag, just as in the other benchmarks. The graph file will only be loaded once, and the sequential solution will also only be generated once.enq-deq-pairs- Measures throughput and fairness. Threads first enqueue an item, then immediately dequeues an item. Use--thread-countto change the amount of threads.
To use specific values you can add different flags to the run command:
- General flags:
-t,--time-limitfor specific time values.-o,--one-socketto run on one socket (specific for our test environment).-i,--iterationsto specify how many iterations to run the benchmark.-e,--empty-popsif you want to include empty dequeue operations.-q,--queue-sizeto specify the sizes of bounded queues.-d,--delayto specify amount of floating points generated between each operation. [Default: 10]--write-stdout- If you want to output to stdout instead of a file.-h,--helpto print help.-V--versionto print the version of the benchmark.--pathto change where the output of the benchmark is put.--print-info- To create a file with hardware info and benchmark info
prod-conbenchmark type sub commands:-p,--producersfor specified amount of producers.-c,--consumersfor specified amount of consumers.
enq-deqbenchmark type sub commands:--spread- To specify the spread for theenq-deqbenchmark type.--thread-count- To specify the amount of threads in theenq-deqbenchmark type.
enq-deq-pairsbenchmark type sub commands:--thread-count- To specify the amount of threads in theenq-deq-pairsbenchmark type.
benchmark_core/memory_tracking- Writes to a file the memory allocated by the program during the execution. Requiresjemalloc, so should work on most UNIX systems.silent-release- Compiles the benchmarking tool without any logging. Need to pass the--no-default-featuresto work.verbose-release- Compiles the benchmarking tool with all log levels. Need to pass the--no-default-featuresto work.
To add your own queues to the benchmarking suite, it should be added as a workspace member, preferably in queues/. This requires adding it as a member in Cargo.toml, and then adding its package folder similarly to the packages in queues (it should have the main main.tex, with the exception of the selected queue).
Your queue will have to implement the traits found in src/traits.rs so that the framework can run it. The traits look like this:
/// The required queue trait.
pub trait ConcurrentQueue<T> {
/// Returns a thread-local handle for the queue.
fn register(&self) -> impl HandleQueue<T>;
/// Returns the name of the queue.
fn get_id(&self) -> String;
/// Used to create a new queue.
/// `size` is discarded for unbounded queues.
fn new(size: usize) -> Self;
}
/// The required queue handle trait.
pub trait HandleQueue<T> {
/// Pushes an item to the queue.
/// If it fails, returns the item pushed.
fn push(&mut self, item: T) -> Result<(), T>;
/// Pops an item from the queue.
fn pop(&mut self) -> Option<T>;
}You should then be able to run benchmarks on your queue by running for example:
cargo run -p new_queue_name -r -- -t 1 enq-deqIn the file order.rs, there are two functions that test that the queue dequeues items in the same order that they were enqueued. This function returning Ok(()) does not mean that the queue always dequeues in order, however it returning Err(()) does mean that the queue sometimes dequeues out of order. The way we have used these functions is by creating one test per queue that runs one of the two (depending on whether the queue uses Box or not). Example:
#[test]
fn test_order() {
let _ = env_logger::builder().is_test(true).try_init();
let q: BasicQueue<i32> = BasicQueue {
bqueue: BQueue::new()
};
if crate::order::benchmark_order_i32(q, 20, 5, true, 10).is_err() {
panic!();
}
}
#[test]
fn test_order() {
let _ = env_logger::builder().is_test(true).try_init();
let q: LCRQueue = LCRQueue::new(10);
if crate::order::benchmark_order_box(q, 20, 5, true, 10).is_err() {
panic!();
}
}The framework is capable of running benchmarks on C/C++ queues as well. Adding them is not as straightforward as Rust queues, and there probably exists a better way to do it than what is done here. We use bindgen to generate C bindings for the C/C++ queues. Below, we show how to add a C++ queue. We also recommend looking at the code in the pre-existing C++ queues.
As with all queues, first create a package folder in queues/. Then add all the C++ headers in a nested folder (e.g. cpp_src/). Then, create two wrapper files queue_wrapper.cpp and queue_wrapper.hpp. In queue_wrapper.hpp, add the functions you want. Typically create, destroy, push and pop.
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
typedef struct YourQueueImpl* YourQueue;
YourQueue create();
void your_queue_destroy()
int your_queue_push(YourQueue q, void* item);
int your_queue_pop(YourQueue q, void** item);
#ifdef __cplusplus
}
#endifThen, in queue_wrapper.cpp create the functions, including the required C++ headers:
// Include everything you need.
#include <queue_wrapper.hpp>
// The actual implementation
struct YourQueueImpl {
your::actual::queue queue;
explicit YourQueueImpl()
: queue() {}
};
YourQueue your_queue_create() {
return new YourQueueImpl(capacity);
}
void your_queue_destroy(YourQueue queue) {
delete queue;
}
int your_queue_push(YourQueue queue, void* item) {
return queue->queue.push(item) ? 1 : 0;
}
int your_queue_pop(YourQueue queue, void** item) {
return queue->queue.pop(*item) ? 1 : 0;
}Add build dependencies to your project for the compilation and bindings:
[build-dependencies]
bindgen = { version = "0.71.1" }
cc = { version = "1.0" }Finally, create a build.rs in the new package to build the C++ code. We suggest copying the file queues/lcrq_cpp/build.rs as a starting point, adapting the following block with specific configuration for the LCRQ:
// Configure for LCRQ
{
let queue_location = "cpp_src";
println!("cargo:rerun-if-changed={}/lcrq_wrapper.hpp", queue_location);
println!("cargo:rerun-if-changed={}/lcrq_wrapper.cpp", queue_location);
println!("cargo:rerun-if-changed={}/LCRQueue.hpp", queue_location);
println!("cargo:rerun-if-changed={}/HazardPointers.hpp", queue_location);
build.file(format!("{}/lcrq_wrapper.cpp", queue_location))
.define("USE_LCRQUEUE", None);
bindgen = bindgen
.header(format!("{}/lcrq_wrapper.hpp", queue_location))
.allowlist_function("lcrq_.*")
.allowlist_type("LCRQ.*")
.opaque_type("LCRQImpl");
}This code tells cargo to recompile the C++ code if we change any of its files, and how to build and create bindings for the queue. When adapting for a new queue, you need to specify the files it is dependent on in the top part, and then change lcrq_wrapper to your_queue_wrapper and LCRQ to YOUR_QUEUE.
Now all that is left is to create the Rust queue files. Create lib.rs and main.rs files in the new package. The main.rs just uses the queue in the lib.rs, as normal rust implementations. The lib.rs file just connects to the C bindings, for example as follows:
#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
use benchmark_core::traits::{ConcurrentQueue, HandleQueue};
// Include the generated bindings
include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
// A safe Rust wrapper around the C bindings
pub struct YourCppQueue<T> {
raw: YourQueue,
phantom_data: std::marker::PhantomData<T>,
}
unsafe impl<T> Send for YourQueue<T> {}
unsafe impl<T> Sync for YourpQueue<T> {}
#[allow(clippy::not_unsafe_ptr_arg_deref)]
impl<T> YourCppQueue<T> {
pub fn push(&self, item: *mut std::ffi::c_void) -> bool {
unsafe {
your_queue_push(self.raw, item) == 1
}
}
pub fn pop(&self) -> Option<*mut std::ffi::c_void> {
let mut item: *mut std::ffi::c_void = std::ptr::null_mut();
let success = unsafe { your_queue_pop(self.raw, &mut item) == 1 };
if success {
Some(item)
} else {
None
}
}
}
impl<T> Drop for YourCppQueue<T> {
fn drop(&mut self) {
unsafe { your_queue_destroy(self.raw) };
}
}
struct YourQueueHandle<'a,T> {
pub q: &'a YourCppQueue<T>
}
impl<T> HandleQueue<T> for YourCppQueueHandle<'_,T> {
fn push(&mut self, item: T) -> Result<(), T> {
let ptr: *mut std::ffi::c_void = Box::<T>::into_raw(Box::new(item)) as *mut std::ffi::c_void;
match self.q.push(ptr) {
true => Ok(()),
false => {
// Really unsure if this is possible
let reclaimed: Box<T> = unsafe { Box::from_raw(ptr as *mut T) };
Err(*reclaimed)
},
}
}
fn pop(&mut self) -> Option<T> {
let res = self.q.pop()?;
let val = unsafe { Box::from_raw(res as *const T as *mut T) };
Some(*val)
}
}
impl<T> ConcurrentQueue<T> for YourCppQueue<T> {
fn register(&self) -> impl HandleQueue<T> {
YourCppQueueHandle {
q: self,
}
}
fn get_id(&self) -> String {
String::from("your_queue")
}
fn new(capacity: usize) -> Self {
let raw = unsafe { your_queue_create(capacity as u32) };
YourCppQueue { raw, phantom_data: std::marker::PhantomData}
}
}If the --write-stdout flag is not set, the framework will produce a folder called ./output and in it will include a .csv file with the headers and results of the entire benchmark. For example, with the command:
cargo run --release --features basic_queue -- -t 1 -i 10 basic| Throughput | Enqueues | Dequeues | Consumers | Producers | Thread Count | Queuetype | Benchmark | Test ID | Fairness |
|---|---|---|---|---|---|---|---|---|---|
| 3836116 | 2022116 | 1814000 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.7928 |
| 3680283 | 1906235 | 1774048 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.7334 |
| 3797156 | 2156525 | 1640631 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.6659 |
| 3630639 | 1893518 | 1737121 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.6256 |
| 4054568 | 2193896 | 1860672 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.5884 |
| 3725101 | 1903091 | 1822010 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.7417 |
| 3439946 | 1719978 | 1719968 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.6608 |
| 3397534 | 1904483 | 1493051 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.7792 |
| 3611314 | 1807886 | 1803428 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.8447 |
| 3539239 | 1952269 | 1586970 | 20 | 20 | -1 | BasicQueue | Basic | b820a6a3f925aa03 | 0.8757 |
Furthermore, if the --print-info flag is set, you will get more specific information about your current test, including some hardware specifications. For example:
Benchmark done: Basic
With queue: Basic Queue
Arguments used in test:
Time limit: 1
One socket?: true
Iterations: 10
Queue size: 10000
Delay: 10
Output path: ./output
Benchmark: Basic(BasicArgs { producers: 20, consumers: 20 })
Write to stdout: false
prefill amount: 0
Test ran on hardware specs:
System name: x
System kernel version: x
System OS version: x
Total RAM (in GB): xThe output file for the BFS benchmark is a little bit different from the other benchmarks. It looks like the following:
| Milliseconds | Queuetype | Thread Count | Test ID |
|---|---|---|---|
| 3836116 | BasicQueue | 20 | b820a6af3925aa03 |
| 3680283 | BasicQueue | 20 | b820a6af3925aa03 |
| 3797156 | BasicQueue | 20 | b820a6af3925aa03 |
| 3630639 | BasicQueue | 20 | b820a6af3925aa03 |
| 4054568 | BasicQueue | 20 | b820a6af3925aa03 |
| 3725101 | BasicQueue | 20 | b820a6af3925aa03 |
| 3439946 | BasicQueue | 20 | b820a6af3925aa03 |
| 3397534 | BasicQueue | 20 | b820a6af3925aa03 |
| 3611314 | BasicQueue | 20 | b820a6af3925aa03 |
| 3539239 | BasicQueue | 20 | b820a6af3925aa03 |
The framework contains a logger, which you can change the level of by changing the environment variable RUST_LOG. When compiled in debug mode, there are 5 levels you can choose from (error will only print errors, warn will print warnings and errors etc.):
errorwarninfodebugtrace
# Example
RUST_LOG=info cargo run --feature basic_queue -- basicWhen compiling for debug, it will be able to log all these levels. When compiling for release it will by default only have the warn and error levels. If you want a release version with more logging you can compile with the verbose-release feature. If you want a release completely void of logging you can compile it with the feature silent-release. However, you will need to pass the --no-default-features flag to cargo as well.
# Example verbose
RUST_LOG=verbose cargo run --release --features verbose-release,basic_queue --no-default-features -- basic
# Or in two steps
cargo build --release --features verbose-release,basic_queue --no-default-features
RUST_LOG=trace ./target/release/${PACKAGE} basic