Specialised Query Executor for TPC-H Q18

Prerequisites

Python 3.8+
duckdb
pandas
pyarrow

Install dependencies with:

pip install -r requirements.txt

Directory Structure

project_root/
├── benchmarks/            # Benchmark strategies
├── data/                  # TPC-H tables in Parquet format
├── engine/
│   ├── duckdb_engine.py
│   └── custom_engine.py
├── results/
│   ├── benchmark/         # Custom engine query results (CSV)
│   └── target/            # DuckDB query results (CSV)
├── init.py                # Data generation script
├── main.py                # Main entry point
└── summary.csv

1. Initialize Project (Generate TPC-H Data)

Run the init command to generate TPCH tables in Parquet format and initialize result directories:

python main.py init

The following files will be created for scale factors [0.5, 1, 2, 5]

data/sf{0.5, 1, 2, 5}/
├── customer.parquet
├── lineitem.parquet
├── nation.parquet
├── orders.parquet
├── part.parquet
├── partsupp.parquet
├── region.parquet
└── supplier.parquet

2. Run the Benchmark

python main.py benchmark

Optional arguments:

--out summary.csv: Output file for the benchmark results (default: summary.csv)
--benchmark 5: Number of timed repetitions per scale factor after one warm-up run (default: 5)
--strategy <strategy>: Benchmark execution strategy:
- interweave (default)
- duckdb_first
- custom_engine_first
--enable_profiling: Enable detailed profiling for the custom engine

This will:

Clear previous results in results/benchmark and results/target
Run both engines for scale factors 0.5, 1, 2, 5
Run one warm-up iteration first, then average the next --benchmark timed runs
Save query results as CSVs in the results folders
Write benchmark results to summary.csv

3. Data Correctness Check

python main.py check

This will compare all matching CSV files in results/benchmark and results/target and print any mismatches.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
benchmarks		benchmarks
data		data
engine		engine
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checker.py		checker.py
init.py		init.py
main.py		main.py
q.18.md		q.18.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Specialised Query Executor for TPC-H Q18

Prerequisites

Directory Structure

1. Initialize Project (Generate TPC-H Data)

2. Run the Benchmark

Optional arguments:

3. Data Correctness Check

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Specialised Query Executor for TPC-H Q18

Prerequisites

Directory Structure

1. Initialize Project (Generate TPC-H Data)

2. Run the Benchmark

Optional arguments:

3. Data Correctness Check

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages