Skip to content

Quick Start

Nallani Bhaskar edited this page Jun 16, 2026 · 3 revisions

Quick Start Guide

Get started with AOCL-DLP in 5 minutes!

Installation

For complete build and installation instructions, see BUILD.md and INSTALL.md.

Quick Install:

# Clone, build, and install
git clone <repository-url>
cd aocl-dlp
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local ..
make -j$(nproc)
sudo make install

For build options, threading models, and advanced configuration, refer to BUILD.md.

Your First AOCL-DLP Program

Step 1: Create Your Application

main.c:

#include <aocl_dlp.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    // Matrix dimensions: C(128x128) = A(128x64) × B(64x128)
    md_t m = 128, n = 128, k = 64;
    
    // Allocate matrices
    float *a = (float*)malloc(m * k * sizeof(float));
    float *b = (float*)malloc(k * n * sizeof(float));
    float *c = (float*)malloc(m * n * sizeof(float));
    
    // Initialize with simple values
    for (size_t i = 0; i < m * k; i++) a[i] = 1.0f;
    for (size_t i = 0; i < k * n; i++) b[i] = 2.0f;
    for (size_t i = 0; i < m * n; i++) c[i] = 0.0f;
    
    // Perform GEMM: C = A × B
    aocl_gemm_f32f32f32of32(
        'R',           // Row-major layout
        'N', 'N',      // No transpose
        m, n, k,       // Dimensions
        1.0f,          // alpha
        a, k, 'N',     // Matrix A
        b, n, 'N',     // Matrix B
        0.0f,          // beta
        c, n,          // Matrix C
        NULL           // No post-ops
    );
    
    printf("✓ GEMM completed: C[0] = %f (expected: 128.0)\n", c[0]);
    
    // Cleanup
    free(a); free(b); free(c);
    return 0;
}

Step 2: Build with CMake

CMakeLists.txt (Shared Library - Recommended):

cmake_minimum_required(VERSION 3.26)
project(MyApp VERSION 1.0.0 LANGUAGES C)

# Find AOCL-DLP
find_package(AoclDlp REQUIRED)

# Create executable
add_executable(my_app main.c)

# Link with AOCL-DLP shared library (recommended)
target_link_libraries(my_app PRIVATE AoclDlp::aocl-dlp m)

CMakeLists.txt (Static Library - Better Performance):

cmake_minimum_required(VERSION 3.26)
project(MyApp VERSION 1.0.0 LANGUAGES C CXX)

# Find AOCL-DLP and OpenMP
find_package(AoclDlp REQUIRED)
find_package(OpenMP REQUIRED)

# Create executable
add_executable(my_app main.c)

# Link with AOCL-DLP static library (requires WHOLE_ARCHIVE)
target_link_libraries(my_app PRIVATE
    $<LINK_LIBRARY:WHOLE_ARCHIVE,AoclDlp::aocl-dlp_static>
    OpenMP::OpenMP_CXX
    m
)

Note: Static linking requires CMake 3.24+ and the WHOLE_ARCHIVE flag for proper performance.
See Integration-Guide for detailed explanation and alternatives for older CMake versions.

Build and Run:

mkdir build && cd build
cmake ..
make
./my_app

Alternative: Build Manually

Shared Library:

gcc -o my_app main.c -I/usr/local/include -L/usr/local/lib -laocl-dlp -lm
./my_app

Static Library:

gcc -o my_app main.c -I/usr/local/include -L/usr/local/lib \
    -Wl,--whole-archive -laocl-dlp_static -Wl,--no-whole-archive \
    -lstdc++ -lm -fopenmp
./my_app

Note: For static linking details and troubleshooting, see Integration-Guide.

What's Next?

Now that you have AOCL-DLP working, explore more features:

Essential Reading

  1. Integration Guide - Complete integration reference

    • Static vs dynamic linking with detailed explanations
    • Troubleshooting common issues
    • Performance optimization
  2. GEMM Guide - Learn about different GEMM variants

  3. Post-Ops Guide - Fuse operations for better performance

  4. BUILD.md - Detailed build configuration options

Code Examples

Explore the examples directory for more complete examples covering various data types, post-operations, and advanced features.

Common First-Time Issues

For detailed troubleshooting, see the Integration Guide - Troubleshooting & FAQ section.

Quick Fixes:

Issue Quick Solution
Library not found at runtime export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
CMake can't find AoclDlp cmake -DCMAKE_PREFIX_PATH=/usr/local ..
Poor performance with static lib Ensure you used --whole-archive flag

For comprehensive solutions and explanations, refer to the Integration Guide.

Performance Tips

Enable Multi-Threading

dlp_thread_set_num_threads(8);  // Set before GEMM calls

Or via environment: OMP_NUM_THREADS=8 ./my_app

Matrix Reordering

For repeated operations with the same matrix, use reordering APIs. See GEMM Guide for details.

Data Type Selection

  • f32: High accuracy (baseline)
  • bf16: 2-4× faster, good accuracy balance
  • int8: 4-8× faster for quantized inference

For detailed performance optimization, see Performance-Guide and Integration-Guide.

API Quick Reference

// Basic GEMM: C = alpha * A × B + beta * C
aocl_gemm_f32f32f32of32('R', 'N', 'N', m, n, k, 
    alpha, a, lda, 'N', b, ldb, 'N', beta, c, ldc, NULL);

// Thread control
dlp_thread_set_num_threads(8);

For complete API documentation, see API Reference and GEMM-Guide.

Need More Help?


Happy optimizing! 🚀

Clone this wiki locally