This reference design demonstrates how to use an FPGA to accelerate database queries for a data-warehouse schema derived from TPC-H.
Area | Description |
---|---|
What you will learn | How to accelerate database queries using an Intel FPGA |
Time to complete | 1 hour |
Category | Reference Designs and End to End |
The database query acceleration sample includes 8 tables and a set of 21 business-oriented queries with broad industry-wide relevance. This reference design shows how four queries can be accelerated using oneAPI. To do so, we create a set of common database operators (found in the src/db_utils/
directory) that are combined in different ways to build the four queries.
Note that this design uses a lot of resources and is designed with Intel® Stratix® 10 FPGA capabilities in mind.
This sample is part of the FPGA code samples. It is categorized as a Tier 4 sample that demonstrates a reference design.
flowchart LR
tier1("Tier 1: Get Started")
tier2("Tier 2: Explore the Fundamentals")
tier3("Tier 3: Explore the Advanced Techniques")
tier4("Tier 4: Explore the Reference Designs")
tier1 --> tier2 --> tier3 --> tier4
style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier4 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
Find more information about how to navigate this part of the code samples in the FPGA top-level README.md. You can also find more information about troubleshooting build errors, links to selected documentation, etc.
Optimized for | Description |
---|---|
OS | Ubuntu* 20.04 RHEL*/CentOS* 8 SUSE* 15 Windows* 10, 11 Windows Server* 2019 |
Hardware | Intel® Agilex® 7, Arria® 10, and Stratix® 10 FPGAs |
Software | Intel® oneAPI DPC++/C++ Compiler |
Note: Even though the Intel DPC++/C++ oneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
- Questa*-Intel® FPGA Edition
- Questa*-Intel® FPGA Starter Edition
- ModelSim® SE
When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
⚠️ Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
Note: You'll need a large FPGA part to be able to fit the query 9 variant of this design
In this design, we accelerate four database queries as offload accelerators. In an offload accelerator scheme, the queries are performed by transferring the relevant data from the CPU host to the FPGA, starting the query kernel on the FPGA, and copying the results back. This means that the relevant performance number is the processing time (the wall clock time) from when the query is requested to the time the output data is accessible by the host. This includes the time to transfer data between the CPU and FPGA over PCIe (with an approximate read and write bandwidth of 6877 and 6582 MB/s, respectively). Most of the total query time is spent transferring the data between the CPU and FPGA, and the query kernels themselves are a small portion of the total latency.
Note: Refer to the Performance Disclaimers section for important performance information.
The key optimization techniques used in this design include:
- Accelerating complex database queries using an Intel FPGA and oneAPI.
- Improving code reuse, readability, and extendability using C++ templates for FPGA device code.
- Showcasing the usage of advanced FPGA optimizations listed above to improve the performance of a large design.
This design leverages concepts discussed in the FPGA tutorials to optimize the different database queries,.
- Shannonization to improve Fmax/II (shannonization)
- Optimizing Inner Loop Throughput (optimize_inner_loop)
- Caching On-Chip Memory to Improve Loop Performance (onchip_memory_cache)
- Unrolling Loops (loop_unroll)
- Loop
ivdep
Attribute (loop_ivdep)
The following sections describe at a high level how queries 1, 9, 11 and 12 are implemented on the FPGA using a set of generalized database operators (found in db_utils/
). In the block diagrams below, the blocks are oneAPI kernels, and the arrows represent pipes
that shows the flow of data from one kernel to another.
Query 1 is the simplest of the four queries and only uses the Accumulator
database operator. The query streams in each row of the LINEITEM table and performs computation on each row.
Query 9 is the most complicated of the four queries and utilizes all database operators (LikeRegex
, Accumulator
, MapJoin
, MergeJoin
, DuplicateMergeJoin
, and FifoSort
). The block diagram of the design is shown below.
Query 11 showcases the MapJoin
and FifoSort
database operators. The block diagram of the design is shown below.
Query 12 showcases the MergeJoin
database operator. The block diagram of the design is shown below.
File | Description |
---|---|
db.cpp |
Contains the main() function and the top-level interfaces to the database functions. |
dbdata.cpp |
Contains code to parse the database input files and validate the query output |
dbdata.hpp |
Definitions of database related data structures and parsing functions |
query1/query1_kernel.cpp |
Contains the kernel for Query 1 |
query9/query9_kernel.cpp |
Contains the kernel for Query 9 |
query9/pipe_types.cpp |
All data types and instantiations for pipes used in query 9 |
query11/query11_kernel.cpp |
Contains the kernel for Query 11 |
query11/pipe_types.cpp |
All data types and instantiations for pipes used in query 11 |
query12/query12_kernel.cpp |
Contains the kernel for Query 12 |
query12/pipe_types.cpp |
All data types and instantiations for pipes used in query 12 |
db_utils/Accumulator.hpp |
Generalized templated accumulators using registers or BRAMs |
db_utils/Date.hpp |
A class to represent dates within the database |
db_utils/fifo_sort.hpp |
An implementation of a FIFO-based merge sorter (based on: D. Koch and J. Torresen, "FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting", in FPGA '11: ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey CA USA, 2011. https://dl.acm.org/doi/10.1145/1950413.1950427) |
db_utils/LikeRegex.hpp |
Simplified REGEX engine to determine if a string 'Begins With', 'Contains', or 'Ends With'. |
db_utils/MapJoin.hpp |
Implements the MapJoin operator |
db_utils/MergeJoin.hpp |
Implements the MergeJoin and DuplicateMergeJoin operators |
db_utils/Misc.hpp |
Miscellaneous utilities used by the operators and the queries |
db_utils/ShannonIterator.hpp |
A template based iterator to improve Fmax/II for designs |
db_utils/StreamingData.hpp |
A generic data structure for streaming data between kernels |
db_utils/Tuple.hpp |
A templated tuple that behaves better on the FPGA than the std::tuple |
db_utils/Unroller.hpp |
A templated-based loop unroller that unrolls loops in the front end |
Note: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the
setvars
script located in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.Linux*:
- For system wide installations:
. /opt/intel/oneapi/setvars.sh
- For private installations:
. ~/intel/oneapi/setvars.sh
- For non-POSIX shells, like csh, use the following command:
bash -c 'source <install-dir>/setvars.sh ; exec csh'
Windows*:
C:\"Program Files (x86)"\Intel\oneAPI\setvars.bat
- Windows PowerShell*, use the following command:
cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'
For more information on configuring environment variables, see Use the setvars Script with Linux* or macOS* or Use the setvars Script with Windows*.
-
Change to the sample directory.
-
Configure the build system for the default target (the Agilex® 7 device family).
mkdir build cd build cmake .. -DQUERY=1
-DQUERY=<QUERY_NUMBER>
can be any of the following query numbers:1
,9
,11
or12
.Note: You can change the default target by using the command:
cmake .. -DQUERY=<QUERY_NUMBER> -DFPGA_DEVICE=<FPGA device family or FPGA part number>
Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
cmake .. -DQUERY=<QUERY_NUMBER> -DFPGA_DEVICE=<board-support-package>:<board-variant>
The build system will try to infer the FPGA family from the BSP name. If it can't, an extra option needs to be passed to
cmake
:-DDEVICE_FLAG=[A10|S10|Agilex7]
Note: You can poll your system for available BSPs using theaoc -list-boards
command. The board list that is printed out will be of the form$> aoc -list-boards Board list: <board-variant> Board Package: <path/to/board/package>/board-support-package <board-variant2> Board Package: <path/to/board/package>/board-support-package
You will only be able to run an executable on the FPGA if you specified a BSP.
-
Compile the design. (The provided targets match the recommended development flow.)
-
Compile for emulation (fast compile time, targets emulated FPGA device).
make fpga_emu
-
Compile for simulation (fast compile time, targets simulator FPGA device):
make fpga_sim
-
Generate HTML performance report.
make report
The report resides at
db.report.prj/reports/report.html
.Note: If you are compiling Query 9 (
-DQUERY=9
), expect a long report generation time. -
Compile for FPGA hardware (longer compile time, targets FPGA device).
make fpga
When building for hardware, the default scale factor is 1. To use the smaller scale factor of 0.01, add the flag
-DSF_SMALL=1
to the originalcmake
command. For example:cmake .. -DQUERY=11 -DSF_SMALL=1
. See the Database files for more information.
-
-
Change to the sample directory.
-
Configure the build system for the default target (the Agilex® 7 device family).
mkdir build cd build cmake -G "NMake Makefiles" .. -DQUERY=1
-DQUERY=<QUERY_NUMBER>
can be any of the following query numbers:1
,9
,11
or12
.Note: You can change the default target by using the command:
cmake -G "NMake Makefiles" .. -DQUERY=<QUERY_NUMBER> -DFPGA_DEVICE=<FPGA device family or FPGA part number>
Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
cmake -G "NMake Makefiles" .. -DQUERY=<QUERY_NUMBER> -DFPGA_DEVICE=<board-support-package>:<board-variant>
The build system will try to infer the FPGA family from the BSP name. If it can't, an extra option needs to be passed to
cmake
:-DDEVICE_FLAG=[A10|S10|Agilex7]
Note: You can poll your system for available BSPs using theaoc -list-boards
command. The board list that is printed out will be of the form$> aoc -list-boards Board list: <board-variant> Board Package: <path/to/board/package>/board-support-package <board-variant2> Board Package: <path/to/board/package>/board-support-package
You will only be able to run an executable on the FPGA if you specified a BSP.
-
Compile the design. (The provided targets match the recommended development flow.)
-
Compile for emulation (fast compile time, targets emulated FPGA device).
nmake fpga_emu
-
Compile for simulation (fast compile time, targets simulator FPGA device):
nmake fpga_sim
-
Generate HTML performance report.
nmake report
The report resides at
db.report.prj/reports/report.html
directory.Note: If you are compiling Query 9 (
-DQUERY=9
), expect a long report generation time. -
Compile for FPGA hardware (longer compile time, targets FPGA device):
nmake fpga
-
Note: If you encounter any issues with long paths when compiling under Windows*, you may have to create your 'build' directory in a shorter path, for example
C:\samples\build
. You can then run cmake from that directory, and provide cmake with the full path to your sample directory, for example:C:\samples\build> cmake -G "NMake Makefiles" C:\long\path\to\code\sample\CMakeLists.txt
Argument | Description | Default |
---|---|---|
--help |
Print help. | false |
--dbroot |
Specify location for the database files (e.g. --dbroot=../data/sf0.01 ). |
"." |
--test |
Specify whether to validate the output of the query. | false |
--print |
Print the output of the query to stdout . |
false |
--args |
Pass custom arguments to the query. (See --help for more information.) |
|
--runs |
Define the number of query iterations to perform for throughput measurement (for example, --runs=5 ). |
1 for emulation 5 for FPGA hardware |
- Run the design on the FPGA emulator (the kernel executes on the CPU).
(Optional) Run the design for queries
./db.fpga_emu --dbroot=../data/sf0.01 --test
9
,11
and12
. - Run the sample on the FPGA simulator device.
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./db.fpga_sim --dbroot=../data/sf0.01 --test
- Run the design on an FPGA device (only if you ran
cmake
with-DFPGA_DEVICE=<board-support-package>:<board-variant>
)../db.fpga --dbroot=../data/sf1 --test
- Run the sample on the FPGA emulator (the kernel executes on the CPU).
(Optional) Run the design for queries
db.fpga_emu.exe --dbroot=../data/sf0.01 --test
9
,11
and12
. - Run the sample on the FPGA simulator device.
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 db.fpga_sim.exe --dbroot=../data/sf0.01 --test set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
Note: Hardware runs are not supported on Windows.
Note: The scale factor 1 (SF=1) database files (
../data/sf1
) are not shipped with this reference design. See the Database files section below for information on how to generate these files.
Parsing database files in: ../data/sf0.01
Parsing LINEITEM table from: ../data/sf0.01/lineitem.tbl
Finished parsing LINEITEM table with 60175 rows
Parsing ORDERS table from: ../data/sf0.01/orders.tbl
Finished parsing ORDERS table with 15000 rows
Parsing PARTS table from: ../data/sf0.01/part.tbl
Finished parsing PARTS table with 2000 rows
Parsing SUPPLIER table from: ../data/sf0.01/supplier.tbl
Finished parsing SUPPLIER table with 100 rows
Parsing PARTSUPPLIER table from: ../data/sf0.01/partsupp.tbl
Finished parsing PARTSUPPLIER table with 8000 rows
Parsing NATION table from: ../data/sf0.01/nation.tbl
Finished parsing NATION table with 25 rows
Database SF = 0.01
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
PASSED
Parsing database files in: ../data/sf1
Parsing LINEITEM table from: ../data/sf1/lineitem.tbl
Finished parsing LINEITEM table with 6001215 rows
Parsing ORDERS table from: ../data/sf1/orders.tbl
Finished parsing ORDERS table with 1500000 rows
Parsing PARTS table from: ../data/sf1/part.tbl
Finished parsing PARTS table with 200000 rows
Parsing SUPPLIER table from: ../data/sf1/supplier.tbl
Finished parsing SUPPLIER table with 10000 rows
Parsing PARTSUPPLIER table from: ../data/sf1/partsupp.tbl
Finished parsing PARTSUPPLIER table with 800000 rows
Parsing NATION table from: ../data/sf1/nation.tbl
Finished parsing NATION table with 25 rows
Database SF = 1
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
Running Q1 within 90 days of 1998-12-1
Validating query 1 test results
Processing time: 44.7673 ms
Kernel time: 4.4103 ms
Throughput: 226.742 queries/s
PASSED
In the data/
directory, you will find database files for a scale factor of 0.01. These are manually generated files that you can use to verify the queries in emulation; however, the supplied files are too small to showcase the true performance of the FPGA hardware.
Note: This reference design currently only supports databases with scale factors of 0.01 or 1.
To generate larger database files to run on the hardware, you can use TPC's dbgen
tool. Instructions for downloading, building, and running the dbgen
tool can be found on the TPC-H website.
As of September 12, 2022, you should be able to perform the following steps:
- Clone the tpch-dbgen GitHub repository.
- Run
make
. - Generate the files using a scale factor of 1:
./dbgen -s 1
. - Copy all the generated
.tbl
files and theanswers
folder in a newdata/sf1
folder.
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt.