Skip to content

Latest commit

 

History

History
239 lines (170 loc) · 8.25 KB

File metadata and controls

239 lines (170 loc) · 8.25 KB

Building AOCL-DLP (Deep Learning Primitives)

This document provides instructions for building the AOCL-DLP library from source code.

📋 System Requirements

Before building AOCL-DLP, ensure your system meets the following requirements:

Software

  • CMake (≥ 3.26)
  • C/C++ compiler with C11/C++17 support (e.g., GCC 11+, Clang 14+)
  • OpenMP and/or pthread libraries (for multi-threading)
  • ninja-build (optional, for Ninja generator support)

Note: GCC 11 introduced AVX512_BF16 support, which is required for bfloat16 GEMM.

Hardware

  • x86 CPU with AVX2/FMA3 support
  • AVX512 support for enhanced performance
  • AVX512_VNNI support for int8 GEMM
  • AVX512_BF16 support for bfloat16 GEMM

Build Configuration Options

AOCL-DLP uses CMake for its build system with several configurable options:

Option Default Description
General Build Options
BUILD_EXAMPLES OFF Build example programs
BUILD_BENCHMARKS OFF Build benchmark programs
BUILD_TESTING OFF Build test programs (requires DLP_CTEST_DISABLED=OFF for CTest)
BUILD_DOXYGEN OFF Build Doxygen documentation
BUILD_SPHINX OFF Build Sphinx documentation
CMAKE_EXPORT_COMPILE_COMMANDS OFF Generate compile_commands.json for tooling
CMAKE_BUILD_TYPE Release Build type ("Release", "Debug", "RelWithDebInfo", "Coverage")
CMAKE_INSTALL_PREFIX /usr/local Installation directory
Compiler Options
CMAKE_CXX_COMPILER system Specify C++ compiler (e.g., g++)
CMAKE_C_COMPILER system Specify C compiler (e.g., gcc)
Threading & Sanitizers
DLP_THREADING_MODEL "none" Threading model ("none", "openmp", "pthread")
DLP_ENABLE_OPENMP ON Override OpenMP support (auto-enabled by threading model)
DLP_OPENMP_ROOT "" Custom path to OpenMP installation
DLP_USE_LLVM_OPENMP OFF Force using LLVM OpenMP implementation
DLP_ENABLE_ASAN OFF Enable AddressSanitizer
DLP_ENABLE_TSAN OFF Enable ThreadSanitizer
DLP_ENABLE_UBSAN OFF Enable UndefinedBehaviorSanitizer
DLP_CTEST_DISABLED ON Disable CTest integration (set to OFF to enable with BUILD_TESTING)
Advanced Options
DLP_ENABLE_HIGH_PRECISION_FLOAT OFF Enable high precision float (double) support

Note:

  • Options can be set via -D<option>=<value> when invoking cmake.
  • Some options (like -GNinja) are passed as command-line arguments, not as variables.
  • For a full list of options, see cmake/dlp_options.cmake and CMakeLists.txt.

Quick Start Build

Linux

  1. Clone and enter project:

    git clone <repository-url> && cd aocl-dlp
  2. Create an out-of-source build directory:

    mkdir -p build && cd build
  3. Configure (choose generator):

    # Default (GNU Make)
    cmake ..
    
    # Ninja (fast incremental builds)
    cmake -G Ninja ..
  4. Build:

    # Make
    make -j$(nproc)
    
    # Ninja
    ninja
  5. For installation instructions, see INSTALL.md.


Advanced Build Configuration

Enabling Additional Components

To enable benchmarks:

cmake -DBUILD_BENCHMARKS=ON ..

To enable testing with full CTest integration:

cmake -DBUILD_TESTING=ON -DDLP_CTEST_DISABLED=OFF ..

Note: Both BUILD_TESTING=ON and DLP_CTEST_DISABLED=OFF are required for full CTest integration. Using only BUILD_TESTING=ON builds tests but uses traditional testing instead of Google Test discovery.

Threading Model Configuration

AOCL-DLP supports multiple threading models:

# No threading (default)
cmake -DDLP_THREADING_MODEL=none ..

# Enable OpenMP threading
cmake -DDLP_THREADING_MODEL=openmp ..

# Enable Pthread threading
cmake -DDLP_THREADING_MODEL=pthread ..

Note: Setting DLP_THREADING_MODEL=openmp automatically enables OpenMP support. The separate DLP_ENABLE_OPENMP option (default: ON) provides additional control and can disable OpenMP entirely with -DDLP_ENABLE_OPENMP=OFF.

For custom OpenMP installation:

cmake -DDLP_THREADING_MODEL=openmp -DDLP_OPENMP_ROOT=/path/to/openmp ..

High Precision Float Support

Enable high precision float (double) support:

cmake -DDLP_ENABLE_HIGH_PRECISION_FLOAT=ON ..

Specifying Build Type

You can specify different build types:

# Debug build
cmake -DCMAKE_BUILD_TYPE=Debug ..

# Release build (default)
cmake -DCMAKE_BUILD_TYPE=Release ..

# Release with debug info
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

# Coverage build (for code coverage analysis)
cmake -DCMAKE_BUILD_TYPE=Coverage ..

Benchmarking

Enable and run tests and benchmarks in one place:

Developer Tips

  • Out-of-tree builds: Always build in a separate build/ directory to keep sources clean.
  • Custom install prefix:
    cmake -DCMAKE_INSTALL_PREFIX=/opt/aocl-dlp ..
  • Verbose output:
    # Make
    make VERBOSE=1
    
    # Ninja
    ninja -v
  • Clean cache:
    rm -rf build/* && cd build && cmake ..
  • Parallel builds: Leverage all cores with -j$(nproc) (Make) or default Ninja parallelism.

CMake Build System Overview

AOCL-DLP uses a modern CMake build system structured as follows:

  • Main CMakeLists.txt: Orchestrates the overall build process
  • cmake/dlp_variables.cmake: Sets project variables, languages and standards
  • cmake/dlp_options.cmake: Defines build options and threading models
  • cmake/dlp_dependencies.cmake: Manages OpenMP and Pthread dependencies
  • cmake/dlp_compiler_flags_linux.cmake: Sets compiler flags for Linux
  • cmake/dlp_compiler_flags_windows.cmake: Sets compiler flags for Windows
  • cmake/dlp_install.cmake: Manages installation rules
  • cmake/dlp_extensions.cmake: Defines file extensions

Troubleshooting

Threading Model Issues

If you encounter issues with the selected threading model:

  1. Make sure the required libraries are installed on your system:

    • For OpenMP: OpenMP development libraries
    • For Pthread: POSIX threads library
  2. For OpenMP-specific issues:

cmake -DDLP_THREADING_MODEL=openmp -DDLP_OPENMP_ROOT=/path/to/openmp ..

Compiler Requirements

Make sure your compiler supports:

  • C11 standard for C code
  • C++17 standard for C++ code

Build Performance

To speed up the build process, use parallel compilation:

make -j$(nproc)  # Linux

Known Issues

  • Warnings may appear during compilation (-Werror is currently disabled)
  • Some platforms may require specific environment setup for threading model detection