Skip to content

Commit 3860e6e

Browse files
committed
Added README
1 parent 8c5d03f commit 3860e6e

File tree

1 file changed

+115
-0
lines changed

1 file changed

+115
-0
lines changed

README.md

+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Fast Arbitrary Precision Floating Point on FPGA
2+
3+
A detailed description of the approach implemented in this repository can be
4+
found in our [FCCM'22
5+
paper](https://spcl.inf.ethz.ch/Publications/.pdf/apfp.pdf) [1].
6+
7+
## Introduction
8+
9+
This repository implements an arbitrary precision floating point multiplier and
10+
adder using Vitis HLS targeting XRT-enabled Xilinx FPGAs, exposing them through
11+
a matrix multiplication primitive that allows running them at full throughput
12+
without becoming memory bound. The design is _fully pipelined_, yielding a MAC
13+
throughput equivalent to the frequency times the number of compute units
14+
instantiated.
15+
16+
Instantiations of the design on an Alveo U250 accelerator were shown to yield
17+
2.0 GMAC/s of 512-bit matrix-matrix multiplication; an order of magnitude
18+
higher than a 36-core dual-socket Xeon node, corresponding to 375× CPU cores
19+
worth of throughput [1].
20+
21+
## Configuration
22+
23+
The hardware design is configured using CMake. The target Xilinx XRT-enabled
24+
platform must be specified with the `APFP_PLATFORM` parameter. The most
25+
important configuration parameters include:
26+
- The width used for the floating point representation is fixed at compile-time
27+
using the `APFP_BITS` CMake parameter, out of which 63 bits will be used for
28+
the exponent, 1 bit will be used for the sign, and the remaining bits will be
29+
used for the mantissa. The value is currently expected to be a multiple of 512
30+
for the sake of being aligned to the memory interface width.
31+
- To scale the design beyond a single pipelined multiplier, the
32+
`APFP_COMPUTE_UNITS` can be used to replicate the full kernel. Each
33+
instantiation will run a fully independent matrix multiplication unit. These
34+
can be used to collaborate on a single matrix multiplication operation (see
35+
`host/TestMatrixMultiplication.cpp` for an example.
36+
- The floating point multiplier uses Karatsuba decomposition to reduce the
37+
overall resource usage of the design. The decomposition bottoms out at
38+
`APFP_MULT_BASE_BITS`, after which it falls back on naive multiplication using
39+
DSPs as generated by the HLS tool. Similarly, the `APFP_ADD_BASE_BITS`
40+
configures the number of bits to dispatch to the HLS tool's addition
41+
implementation, manually pipelining the addition into multiple stages above
42+
this threshold.
43+
- To avoid being memory bound, the matrix multiplication implementation is
44+
tiled using the approach described in our [FPGA'20
45+
paper](https://spcl.inf.ethz.ch/Publications/.pdf/gemm-fpga.pdf) [2]. The
46+
tile sizes are exposed through the `APFP_TILE_SIZE_N` and `APFP_TILE_SIZE_M`
47+
parameters. The highest arithmetic intensity is achieved when these two
48+
quantities are equal and maximized, but relatively small tile sizes are
49+
sufficient to overcome the memory bottleneck (e.g., 32x32). Higher tile sizes
50+
increase arithmetic intensity at the cost of BRAM usage, and potential
51+
overhead when the input matrix is not a multiple of the tile size.
52+
- `APFP_FREQUENCY` can be used to change the maximum frequency targeted by the
53+
design. If unspecified, the default of the target platform will be used.
54+
55+
For more details on how to configure the project to achieve high throughput,
56+
see our paper [1].
57+
58+
## Configuration and compilation
59+
60+
Please make sure you clone the repository with `git clone --recursive` or run
61+
`git submodule update --init` after cloning to check out dependencies.
62+
63+
The minimum commands necessary to configure and build the code are:
64+
65+
```bash
66+
mkdir build
67+
cd build
68+
cmake .. # Default parameters
69+
make # Builds software components
70+
make hw # Builds hardware accelerator
71+
```
72+
73+
However, the accelerator should always be configured to match the target system
74+
using the parameters described in the previous section and in our paper [1].
75+
The CMake configuration flow uses
76+
[hlslib](https://github.com/definelicht/hlslib) [3] to locate the Xilinx tools
77+
and expose hardware build targets.
78+
79+
The project depends on Vitis, GMP, and MPFR to successfully configure.
80+
81+
## Running the code
82+
83+
We provide an example host code that runs the matrix multiplication accelerator
84+
on a randomized input in `host/TestMatrixMultiplication.cpp`. See the executable
85+
for usage. An example invocation could be:
86+
87+
```bash
88+
./TestMatrixMultiplicationHardware hw 256 256 256
89+
```
90+
91+
## Installation
92+
93+
To install the project, including both the software interface components and the
94+
hardware accelerator itself (built with `make hw`), simply run `make install`.
95+
The location to install the project in is configured with the
96+
`CMAKE_INSTALL_PREFIX` parameter.
97+
98+
## References
99+
100+
[1] Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos
101+
Ziogas, David Simmons-Duffin, Torsten Hoefler, _"Fast Arbitrary Precision
102+
Floating Point on FPGA"_, in Proceedings of the 2022 IEEE 30th Annual
103+
International Symposium on Field-Programmable Custom Computing Machines
104+
(FCCM'22). [🔗](https://spcl.inf.ethz.ch/Publications/.pdf/apfp.pdf)
105+
106+
[2] Johannes de Fine Licht, Grzegorz Kwasniewski, and Torsten Hoefler,
107+
_"Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level
108+
Synthesis"_, in Proceedings of 28th ACM/SIGDA International Symposium on
109+
Field-Programmable Gate Arrays (FPGA'20).
110+
[🔗](https://spcl.inf.ethz.ch/Publications/.pdf/gemm-fpga.pdf)
111+
112+
[3] Johannes de Fine Licht, and Torsten Hoefler. _"hlslib: Software Engineering
113+
for Hardware Design."_, presented at the Fifth International Workshop on
114+
Heterogeneous High-performance Reconfigurable Computing (H2RC'19).
115+
[🔗](https://spcl.inf.ethz.ch/Publications/.pdf/hlslib.pdf)

0 commit comments

Comments
 (0)