SACABench is a suffix array construction algorithms benchmark framework that allows for an easy and reproducible benchmarks of suffix array construction algorithms (SACAs). It contains (to our best knowledge) all publicly available SACAs and new SACAs can easily be included in a few straight forward steps. In addition, this framework contains multiple new implementations of SACAs where we tried out techniques to improve existing implementations.
Below, you find information on how to install and use SACABench, and also on how to include new SACAs in the framework.
On a fresh Ubuntu 18.04 LTS installation, the following dependencies need to be installed:
# REQUIRED: git, the gcc compiler, the cmake build system
sudo apt install git build-essential cmake
# REQUIRED for some dependencies: the autoconf build system
sudo apt install autoconf libtool
# RECOMMENDED for faster rebuilds during development: ccache
sudo apt install ccache
# OPTIONAL for some support scripts: python 3
sudo apt install python3
# OPTIONAL for the --rplot option: the R programming language
sudo apt install r-base# Clone this repo
git clone https://github.com/sacabench/sacabench.git
# Create build directory
cd sacabench
mkdir build
cd build
# Configure the build system for a optimized "Release" build
cmake .. -DCMAKE_BUILD_TYPE=Release
# Build the sacabench tool
make
# Build & Run unit tests:
make checkThere are also a few cmake options and build system targets that are useful for active development:
cmake .. -DCMAKE_BUILD_TYPE=Releaseconfigure for optimized release builds.cmake .. -DCMAKE_BUILD_TYPE=Debugconfigure for unoptimized debug builds with assertions.make build_checkonly build the unit tests, don't run them.cmake .. -DSACA_WERROR=ONturn gcc warnings into hard errors.cmake .. -DSACA_RELEASE_DEBUGINFO=ONactivate debug information (-g) in release builds.make datasetsdownload a few 200MiB datasets from different sources.
After you've build the sacabench tool via make, it can be found at <build directory>/sacabench/sacabench. It offers the following CLI options:
sacabench listlists all available suffix array construction algorithms contained in this project.sacabench demotests the correct functionality on the current system.sacabench constructexecutes a single SACA.sacabench batchallows to execute and compare multiple SACAs.sacabench plotgenerates plots from a previously measurement.
For more information about the commands and available options, use -h or --help, e.g. sacabench list --help
If you want to use our benchmark tool for your own SACA implementation, you first need to implement our SACA interface within the sacabench/saca directory.
// Either class or struct work
class your_saca {
// The amount of sentinels (null byte) your SACA needs at the end of the input text.
static constexpr EXTRA_SENTINELS = 0;
// Your SACA's name
static constexpr NAME = "Your_saca";
// The Description for your SACA
static constexpr DESCRIPTION = "Your SACA's description.";
/**
Member function to call your SACA from within our framework.
@param text The input text (with sentinels)
@param alphabet The input's alphabet (without sentinel)
@param out_sa The span containing your computed sa.
*/
template <typename sa_index>
static void construct_sa(util::string_span text,
util::alphabet const& alphabet,
util::span<sa_index> out_sa) {
/*
Your SACAs implementation will be called from here.
*/
}
};After you implement this interface, you still need to register your saca in the register.cmake file (within the same directory):
SACA_REGISTER("saca/path_to_your_saca_header/saca.hpp"
sacabench::your_namespace)It is recommended to begin your namespace with sacabench.
If you wish to implement a GPGPU-SACA, you are advised to add your SACA within
the IF(CUDA) environment.