-
Notifications
You must be signed in to change notification settings - Fork 5
Quick Start
Get started with AOCL-DLP in 5 minutes!
For complete build and installation instructions, see BUILD.md and INSTALL.md.
Quick Install:
# Clone, build, and install
git clone <repository-url>
cd aocl-dlp
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local ..
make -j$(nproc)
sudo make installFor build options, threading models, and advanced configuration, refer to BUILD.md.
main.c:
#include <aocl_dlp.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
// Matrix dimensions: C(128x128) = A(128x64) × B(64x128)
md_t m = 128, n = 128, k = 64;
// Allocate matrices
float *a = (float*)malloc(m * k * sizeof(float));
float *b = (float*)malloc(k * n * sizeof(float));
float *c = (float*)malloc(m * n * sizeof(float));
// Initialize with simple values
for (size_t i = 0; i < m * k; i++) a[i] = 1.0f;
for (size_t i = 0; i < k * n; i++) b[i] = 2.0f;
for (size_t i = 0; i < m * n; i++) c[i] = 0.0f;
// Perform GEMM: C = A × B
aocl_gemm_f32f32f32of32(
'R', // Row-major layout
'N', 'N', // No transpose
m, n, k, // Dimensions
1.0f, // alpha
a, k, 'N', // Matrix A
b, n, 'N', // Matrix B
0.0f, // beta
c, n, // Matrix C
NULL // No post-ops
);
printf("✓ GEMM completed: C[0] = %f (expected: 128.0)\n", c[0]);
// Cleanup
free(a); free(b); free(c);
return 0;
}CMakeLists.txt (Shared Library - Recommended):
cmake_minimum_required(VERSION 3.26)
project(MyApp VERSION 1.0.0 LANGUAGES C)
# Find AOCL-DLP
find_package(AoclDlp REQUIRED)
# Create executable
add_executable(my_app main.c)
# Link with AOCL-DLP shared library (recommended)
target_link_libraries(my_app PRIVATE AoclDlp::aocl-dlp m)CMakeLists.txt (Static Library - Better Performance):
cmake_minimum_required(VERSION 3.26)
project(MyApp VERSION 1.0.0 LANGUAGES C CXX)
# Find AOCL-DLP and OpenMP
find_package(AoclDlp REQUIRED)
find_package(OpenMP REQUIRED)
# Create executable
add_executable(my_app main.c)
# Link with AOCL-DLP static library (requires WHOLE_ARCHIVE)
target_link_libraries(my_app PRIVATE
$<LINK_LIBRARY:WHOLE_ARCHIVE,AoclDlp::aocl-dlp_static>
OpenMP::OpenMP_CXX
m
)Note: Static linking requires CMake 3.24+ and the
WHOLE_ARCHIVEflag for proper performance.
See Integration-Guide for detailed explanation and alternatives for older CMake versions.
Build and Run:
mkdir build && cd build
cmake ..
make
./my_appShared Library:
gcc -o my_app main.c -I/usr/local/include -L/usr/local/lib -laocl-dlp -lm
./my_appStatic Library:
gcc -o my_app main.c -I/usr/local/include -L/usr/local/lib \
-Wl,--whole-archive -laocl-dlp_static -Wl,--no-whole-archive \
-lstdc++ -lm -fopenmp
./my_appNote: For static linking details and troubleshooting, see Integration-Guide.
Now that you have AOCL-DLP working, explore more features:
-
Integration Guide - Complete integration reference
- Static vs dynamic linking with detailed explanations
- Troubleshooting common issues
- Performance optimization
-
GEMM Guide - Learn about different GEMM variants
-
Post-Ops Guide - Fuse operations for better performance
-
BUILD.md - Detailed build configuration options
Explore the examples directory for more complete examples covering various data types, post-operations, and advanced features.
For detailed troubleshooting, see the Integration Guide - Troubleshooting & FAQ section.
Quick Fixes:
| Issue | Quick Solution |
|---|---|
| Library not found at runtime | export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH |
| CMake can't find AoclDlp | cmake -DCMAKE_PREFIX_PATH=/usr/local .. |
| Poor performance with static lib | Ensure you used --whole-archive flag |
For comprehensive solutions and explanations, refer to the Integration Guide.
dlp_thread_set_num_threads(8); // Set before GEMM callsOr via environment: OMP_NUM_THREADS=8 ./my_app
For repeated operations with the same matrix, use reordering APIs. See GEMM Guide for details.
- f32: High accuracy (baseline)
- bf16: 2-4× faster, good accuracy balance
- int8: 4-8× faster for quantized inference
For detailed performance optimization, see Performance-Guide and Integration-Guide.
// Basic GEMM: C = alpha * A × B + beta * C
aocl_gemm_f32f32f32of32('R', 'N', 'N', m, n, k,
alpha, a, lda, 'N', b, ldb, 'N', beta, c, ldc, NULL);
// Thread control
dlp_thread_set_num_threads(8);For complete API documentation, see API Reference and GEMM-Guide.
- Integration Guide - Comprehensive integration reference
- FAQ - Frequently asked questions
- API Documentation - Complete API reference
- BUILD.md - Build system documentation
Happy optimizing! 🚀
Getting Started
User Guides
- Library Overview
- GEMM Guide
- Batch GEMM Guide
- Post-Operations
- Eltwise Operations
- Quantization
- API Lifecycle
Performance & Config
Testing & Benchmarking
Developer Guides
Reference