Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAS compatibility library #7

Open
wants to merge 74 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
7cbf4de
BLAS library WIP
ChrisPattison Nov 14, 2021
2d72035
Empty matrix is a noop
ChrisPattison Nov 26, 2021
08461cc
put mpf_t in one place
ChrisPattison Nov 26, 2021
b531adc
PackFloat ToMpfr and ToGmp now const
ChrisPattison Nov 26, 2021
a59dd65
mpf_t/mpfr_t wrapper type
ChrisPattison Nov 26, 2021
d11fc10
unsigned long for precision
ChrisPattison Nov 26, 2021
0745071
overload of ApfpInterfaceType constructor with precision specified
ChrisPattison Nov 26, 2021
1da49df
ApfpInterfaceWrapper move semantics
ChrisPattison Nov 27, 2021
dd0db45
Fix memory leaks in BLAS library
ChrisPattison Nov 27, 2021
6ecea7a
Merge commit 'b3c3232369122bda9a551eb6777cc90d7721124f' into blas
ChrisPattison Dec 14, 2021
2dd8b21
Matrix Addition dummy
ChrisPattison Dec 14, 2021
e4165cb
mpf_t |-> mpf_ptr in PackedFloat
ChrisPattison Dec 14, 2021
7da55be
const ToGmp
ChrisPattison Dec 14, 2021
aeb9ce1
Hostlib takes mpf_ptr. Host transpose/add syrk
ChrisPattison Dec 14, 2021
4c499ae
Merge commit 'cd2be5046e33205e11bf54814d96263f8e66efed' into blas
ChrisPattison Dec 18, 2021
bfcacd9
MPFR BLAS interface
ChrisPattison Dec 19, 2021
b50c80e
Add unsigned long init and mul to wrapper header
ChrisPattison Dec 20, 2021
4947c85
Generate takes mpfr_ptr
ChrisPattison Dec 20, 2021
d96de48
BLAS syrk unit test
ChrisPattison Dec 20, 2021
4eb4881
Blas unit tests in separate executable
ChrisPattison Dec 21, 2021
c49e272
Search for kernel in current working directory
ChrisPattison Dec 22, 2021
bdd9f35
Throw an exception if we can't find the kernel
ChrisPattison Dec 26, 2021
adaba04
Guard against calling unitialized library
ChrisPattison Dec 26, 2021
13aa0e9
Add mechanism to get ApfpBlas error strings
ChrisPattison Dec 26, 2021
e2c32d8
Guard error code for ApfpInit in UnitTests
ChrisPattison Dec 26, 2021
41f75a8
More sophisticated kernel search routine
ChrisPattison Dec 26, 2021
4ab5d11
Setup/teardown test case
ChrisPattison Dec 26, 2021
3c37a15
Fix buffer size check on TransferToHost
ChrisPattison Dec 27, 2021
98f4721
CopyTransposeFromMatrix destination LDA
ChrisPattison Dec 27, 2021
ee113ba
Blas unit tests pass
ChrisPattison Dec 27, 2021
1d53578
Move interface type <gmp/mpfr> to Config.h
ChrisPattison Dec 28, 2021
c6a86a7
install kernels to lib
ChrisPattison Dec 28, 2021
4c22885
Compile under GMP interface type
ChrisPattison Dec 28, 2021
2aa28f5
Fix closeness check in BlasUnitTest for a=b=0
ChrisPattison Dec 28, 2021
47022fd
Use generators for SYRK test case
ChrisPattison Dec 28, 2021
8c8e2c2
Add config.h to install dirs
ChrisPattison Dec 28, 2021
b1449e1
Support 'T' argument in syrk
ChrisPattison Dec 28, 2021
572a4ca
Check upper/lower Syrk mode
ChrisPattison Dec 28, 2021
102c818
Fix MPFR wrapper argument order
ChrisPattison Dec 29, 2021
d22a4ee
Merge branch 'main' into blas
ChrisPattison Dec 29, 2021
b223108
Remove mystery character in CMakeLists.txt
ChrisPattison Dec 29, 2021
71a4cf8
Fix LD_LIBRARY_PATH search for FPGA kernel
ChrisPattison Dec 30, 2021
14274fc
Marginally more helpful error handling
ChrisPattison Dec 30, 2021
0b435be
GMP allows aliasing inputs
ChrisPattison Dec 30, 2021
1f43348
Install hw emu kernel
ChrisPattison Dec 30, 2021
9856042
Do SYRK addition on the FPGA
ChrisPattison Dec 30, 2021
1e8cffe
Move Apfp lib into namespace
ChrisPattison Dec 30, 2021
c1fdfa9
Make the interface type wrapping nicer
ChrisPattison Dec 30, 2021
0677b71
Rename ErrorDescription
ChrisPattison Dec 30, 2021
03859ee
Enum class Uplo/Trans
ChrisPattison Dec 30, 2021
b1a768d
Formatting because I keep forgetting
ChrisPattison Dec 30, 2021
78a4594
apfpHostlib naming convention
ChrisPattison Dec 31, 2021
119185c
Switch kernel to column major ordering
ChrisPattison Jan 3, 2022
8338d02
Remove extremely large volume simulation test cases
ChrisPattison Jan 3, 2022
1f8cd57
Merge branch 'main' into blas
ChrisPattison Jan 3, 2022
137d11d
ApfpIsInitialized |-> IsInitialized
ChrisPattison Jan 4, 2022
1cb5c90
Merge branch 'main' into col_major
ChrisPattison Jan 6, 2022
98434fc
Scale back directory search for kernel
ChrisPattison Jan 6, 2022
4e23918
Missing function renames
ChrisPattison Jan 6, 2022
7f1fa90
Add cwd to kernel search path
ChrisPattison Jan 6, 2022
6fd095e
Set INTERFACE_TYPE to SEMANTICS
ChrisPattison Jan 6, 2022
cacf283
Merge branch 'main' into blas
ChrisPattison Jan 6, 2022
9d1173c
BlasError is scoped enum
ChrisPattison Jan 6, 2022
9dbcb97
Merge branch 'col_major' into blas
ChrisPattison Jan 6, 2022
d0ab102
class Apfp -> Context
ChrisPattison Jan 6, 2022
155d0b8
Use RNDZ MPFR rounding mode everywhere
ChrisPattison Jan 6, 2022
55c9fe8
Throw KernelNotFoundException if APFP_KERNEL misset
ChrisPattison Jan 6, 2022
4e5c34d
Add comment about memory layout
ChrisPattison Jan 7, 2022
8b22b71
More descriptive SYRK unit tests
ChrisPattison Jan 8, 2022
ad5d63f
Add GEMM
ChrisPattison Jan 8, 2022
40a61a6
Missing syrk test case
ChrisPattison Jan 8, 2022
66ecc7e
Fix M and N in GEMM
ChrisPattison Jan 8, 2022
f9c7c3c
GEMM unit tests
ChrisPattison Jan 8, 2022
1685fd2
Go fast and break things - just not the unit tests!
ChrisPattison Jan 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.0)
project(apfp)
 

set(CMAKE_CXX_STANDARD 17)

# Target options
Expand All @@ -12,6 +12,8 @@ set(APFP_TILE_SIZE_M 32 CACHE STRING "Tile size in the M-dimension when running
set(APFP_SEMANTICS "MPFR" CACHE STRING "Which semantics to use for floating point operations [GMP/MPFR].")
set(APFP_PROFILING OFF CACHE BOOL "Enable profiling in the generated kernel.")
set_property(CACHE APFP_SEMANTICS PROPERTY STRINGS GMP MPFR)
set(APFP_INTERFACE_TYPE "MPFR" CACHE STRING "Which data types to use for the interface [GMP/MPFR].")
set_property(CACHE APFP_INTERFACE_TYPE PROPERTY STRINGS GMP MPFR)

# Validation and derived numbers
math(EXPR APFP_ALIGNED "${APFP_BITS} % 512")
Expand All @@ -28,7 +30,7 @@ find_package(GMP REQUIRED)
find_package(Threads REQUIRED)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Wpedantic -Wno-unused-label -DAPFP_${APFP_SEMANTICS}_SEMANTICS")
include_directories(${CMAKE_BINARY_DIR} include SYSTEM hlslib/include ${Vitis_INCLUDE_DIRS} )
include_directories(${CMAKE_BINARY_DIR} include SYSTEM hlslib/include ${Vitis_INCLUDE_DIRS} interface)

configure_file(include/Config.h.in Config.h)

Expand All @@ -38,7 +40,7 @@ set(APFP_KERNEL_FILES device/MatrixMultiplication.cpp

# Setup FPGA kernel targets
add_vitis_kernel(MatrixMultiplication FILES ${APFP_KERNEL_FILES}
INCLUDE_DIRS include hlslib/include ${CMAKE_BINARY_DIR}
INCLUDE_DIRS include hlslib/include ${CMAKE_BINARY_DIR} ${GMP_INCLUDES}
HLS_FLAGS "-DAP_INT_MAX_W=${APFP_MAX_BITS} -DAPFP_${APFP_SEMANTICS}_SEMANTICS"
HLS_CONFIG "config_compile -pipeline_style frp\nconfig_dataflow -fifo_depth 16"
DEPENDS ${CMAKE_BINARY_DIR}/Config.h
Expand All @@ -61,7 +63,7 @@ add_library(simulation ${APFP_KERNEL_FILES})
target_compile_options(simulation PRIVATE -Wno-unknown-pragmas -DAP_INT_MAX_W=${APFP_MAX_BITS})
target_link_libraries(simulation ${CMAKE_THREAD_LIBS_INIT})

add_library(ApfpHostlib SHARED interface/Apfp.cpp)
add_library(ApfpHostlib SHARED interface/Apfp.cpp interface/ApfpBlas.cpp interface/ApfpInterfaceType.cpp)
target_link_libraries(ApfpHostlib ${Vitis_LIBRARIES} ${GMP_LIBRARIES})
target_compile_definitions(ApfpHostlib PRIVATE HLSLIB_SIMULATE_OPENCL)

Expand All @@ -79,7 +81,20 @@ enable_testing()
add_test(TestSimulation TestSimulation 4 4 4)
add_library(Catch host/Catch.cpp)
add_executable(UnitTests host/UnitTests.cpp)
target_link_libraries(UnitTests Catch ${GMP_LIBRARIES} ${MPFR_LIBRARIES} apfp simulation)
target_link_libraries(UnitTests Catch ${GMP_LIBRARIES} ${MPFR_LIBRARIES} apfp ApfpHostlib simulation)
add_test(UnitTests UnitTests)

install(TARGETS ApfpHostlib)
add_executable(BlasUnitTests host/BlasUnitTests.cpp)
target_link_libraries(BlasUnitTests Catch ${GMP_LIBRARIES} ${MPFR_LIBRARIES} apfp ApfpHostlib simulation)

install(TARGETS ApfpHostlib)
install(FILES
interface/Apfp.h
interface/ApfpBlas.h
interface/ApfpInterfaceType.h
${CMAKE_BINARY_DIR}/Config.h
DESTINATION include/apfp)
install(FILES
${CMAKE_BINARY_DIR}/MatrixMultiplication_hw.xclbin
${CMAKE_BINARY_DIR}/MatrixMultiplication_hw_emu.xclbin
DESTINATION lib)
139 changes: 139 additions & 0 deletions host/BlasUnitTests.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
#include <catch.hpp>
#include <iostream>
#include <limits>

#include "Config.h"

// #include "ArithmeticOperations.h"
// #include "Karatsuba.h"
// #include "PackedFloat.h"
#include "ApfpBlas.h"
#include "Random.h"

void ApfpSetup() {
#ifdef APFP_GMP_INTERFACE_TYPE
mpf_set_default_prec(kMantissaBits);
#else
mpfr_set_default_prec(kMantissaBits);
#endif
auto apfp_error_code = apfp::Init(kMantissaBits);
REQUIRE(apfp_error_code == apfp::BlasError::success);
}

void ApfpTeardown() {
apfp::Finalize();
}

bool IsZero(apfp::interface::ConstPtr a) {
#ifdef APFP_GMP_INTERFACE_TYPE
return mpf_sgn(a) == 0;
#else
return mpfr_sgn(a) == 0;
#endif
}

bool IsClose(apfp::interface::ConstPtr a, apfp::interface::ConstPtr b) {
// Avoids divide by zero if a = b = 0
if (IsZero(a) && IsZero(b)) {
return true;
}

apfp::interface::Wrapper diff, sum, ratio;
#ifdef APFP_GMP_INTERFACE_TYPE
mpf_sub(diff.get(), a, b);
mpf_add(sum.get(), a, b);
mpf_div(ratio.get(), diff.get(), sum.get());
long exp;
mpf_get_d_2exp(&exp, ratio.get());
#else
auto rounding_mode = mpfr_get_default_rounding_mode();
mpfr_sub(diff.get(), a, b, rounding_mode);
mpfr_add(sum.get(), a, b, rounding_mode);
mpfr_div(ratio.get(), diff.get(), sum.get(), rounding_mode);
auto exp = mpfr_get_exp(ratio.get());
#endif
// Require the numbers to match to the first 90% decimal places
return exp < -((kMantissaBits * 3 * 9) / 10);
}

TEST_CASE("Init_Teardown") {
ApfpSetup();
ApfpTeardown();
}

TEST_CASE("SYRK") {
ApfpSetup();

auto rng = RandomNumberGenerator();

unsigned long N = GENERATE(0, 1, 2, 8, 15, 16, 31, 32, 33);
unsigned long K = GENERATE(0, 1, 2, 8, 15, 16, 31, 32, 33);
auto mode = GENERATE(apfp::BlasTrans::normal, apfp::BlasTrans::transpose);
auto uplo_mode = GENERATE(apfp::BlasUplo::upper, apfp::BlasUplo::lower);
// Test SYRK
// In 'N' mode, we perform AA^T + C
// A is NxK (A : R^K -> R^N)
// C is NxN
// Matrices are stored column major because BLAS
{
std::vector<apfp::interface::Wrapper> a_matrix;
a_matrix.resize(N * K);
for (auto& v : a_matrix) {
rng.Generate(v.get());
}

std::vector<apfp::interface::Wrapper> c_matrix;
c_matrix.resize(N * N);
for (auto& v : c_matrix) {
rng.Generate(v.get());
}

std::vector<apfp::interface::Wrapper> ref_result;
ref_result.resize(N * N);

// Compute reference result
apfp::interface::Wrapper prod_temp;
for (unsigned long j = 0; j < N; ++j) {
// lower half
for (unsigned long i = 0; i < N; ++i) {
auto r_idx = i + j * N;
apfp::interface::Set(ref_result.at(r_idx).get(), c_matrix.at(r_idx).get());

for (unsigned long k = 0; k < K; ++k) {
// A is NxK if N, KxN if T
if (mode == apfp::BlasTrans::normal) {
// (AB)_ij = sum_k A(i,k)B(k,j)
apfp::interface::Mul(prod_temp.get(), a_matrix.at(i + k * N).get(),
a_matrix.at(j + k * N).get());
} else {
// (AB)_ij = sum_k A(i,k) B(k,j)
apfp::interface::Mul(prod_temp.get(), a_matrix.at(k + i * K).get(),
a_matrix.at(k + j * K).get());
}
apfp::interface::Add(ref_result.at(r_idx).get(), prod_temp.get(), ref_result.at(r_idx).get());
}
}
}

// Use APFP BLAS library
auto error_code = apfp::Syrk(
uplo_mode, mode, N, K, [&](unsigned long i) { return a_matrix.at(i).get(); },
mode == apfp::BlasTrans::normal ? N : K, [&](unsigned long i) { return c_matrix.at(i).get(); }, N);
REQUIRE(error_code == apfp::BlasError::success);

// Check all entries are sufficiently close
apfp::interface::Wrapper diff;
for (unsigned long j = 0; j < N; ++j) {
// lower half
for (unsigned long i = 0; i < j; ++i) {
auto ref_value = uplo_mode == apfp::BlasUplo::lower ? ref_result.at(i + j * N).get()
: ref_result.at(j + i * N).get();
auto test_value =
uplo_mode == apfp::BlasUplo::lower ? c_matrix.at(i + j * N).get() : c_matrix.at(j + i * N).get();
REQUIRE(IsClose(ref_value, test_value));
}
}
}

ApfpTeardown();
}
4 changes: 2 additions & 2 deletions host/Random.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ __mpfr_struct RandomNumberGenerator::GenerateMpfr() {
return num[0];
}

void RandomNumberGenerator::Generate(mpfr_t &num) {
void RandomNumberGenerator::Generate(mpfr_ptr num) {
std::unique_lock<std::mutex> lock(mutex_);
mpfr_urandom(num, state_, kRoundingMode);
}

void RandomNumberGenerator::Generate(mpf_t &num) {
void RandomNumberGenerator::Generate(mpf_ptr num) {
std::unique_lock<std::mutex> lock(mutex_);
mpf_urandomb(num, state_, kMantissaBits);
}
2 changes: 2 additions & 0 deletions include/Config.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ constexpr int kTileSizeN = ${APFP_TILE_SIZE_N};
constexpr int kTileSizeM = ${APFP_TILE_SIZE_M};
constexpr auto kBuildDir = "${CMAKE_BINARY_DIR}";
static_assert(kBits % 8 == 0, "Number of bits must be byte-aligned.");

#define APFP_${APFP_INTERFACE_TYPE}_INTERFACE_TYPE
4 changes: 2 additions & 2 deletions include/PackedFloat.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ class PackedFloat {
return *this;
}

inline void ToGmp(mpf_ptr num) {
inline void ToGmp(mpf_ptr num) const {
const size_t gmp_limbs = (mpf_get_prec(num) + 8 * sizeof(mp_limb_t) - 1) / (8 * sizeof(mp_limb_t));
constexpr size_t kNumLimbs = kMantissaBytes / sizeof(Limb);
// GMP does not allow graceful rounding, so we cannot handle having insufficient bits in the target GMP number
Expand All @@ -104,7 +104,7 @@ class PackedFloat {
}
}

inline void ToMpfr(mpfr_t num) {
inline void ToMpfr(mpfr_t num) const {
// Copy the most significant bytes, padding zeros if necessary
const auto mpfr_limbs = (mpfr_get_prec(num) + 8 * sizeof(mp_limb_t) - 1) / (8 * sizeof(mp_limb_t));
const size_t mpfr_bytes = mpfr_limbs * sizeof(mp_limb_t);
Expand Down
4 changes: 2 additions & 2 deletions include/Random.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ class RandomNumberGenerator {
__mpf_struct GenerateGmp();

/// Generate a random GMP number into the specified output variable.
void Generate(mpf_t &);
void Generate(mpf_ptr);

/// Generate a random MPFR number.
__mpfr_struct GenerateMpfr();

/// Generate a random MPFR into the specified output variable.
void Generate(mpfr_t &);
void Generate(mpfr_ptr);

private:
gmp_randstate_t state_;
Expand Down
Loading