Home

AOCL-DLP Documentation Hub

AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) is a high-performance library providing optimized deep learning primitives for AMD processors. It implements GEMM operations for machine learning applications, supporting multiple data types, fused pre/post-operations, and batch processing -- all tuned to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, AVX512_BF16, and AVX512_FP16 instruction sets.

New here? Start with the Quick Start Guide to build, install, and run your first GEMM in 5 minutes.

Getting Started

Quick Start -- Install, build your first program, and run it
Integration Guide -- CMake packages, manual linking, static vs dynamic, troubleshooting
Examples & Tutorials -- Annotated code examples for every feature

User Guides

Library Overview -- Architecture, components, data types, hardware abstraction
GEMM Guide -- Data type combinations, memory layouts, matrix reordering, choosing the right variant
Batch GEMM Guide -- Grouped batch interface, availability matrix, reordered B and post-ops in batch mode
Post-Operations Guide -- Fused post-ops (BIAS, activations, SCALE, MATRIX_ADD/MUL) via dlp_metadata_t
Eltwise Operations Guide -- Standalone element-wise operations (separate from GEMM post-ops)
Quantization Guide -- Symmetric quantization, mixed-precision workflows, scale/zero-point setup
API Lifecycle -- End-to-end flow: data prep, post-ops setup, compute, threading

Performance & Configuration

Performance Guide -- Threading, NUMA, memory layout, architecture-specific tips
Environment Variables -- Complete reference for DLP_NUM_THREADS, AOCL_DLP_ENABLE_INSTRUCTIONS, OpenMP tuning

Testing & Benchmarking

DLP Testing -- Google Test framework, YAML configs, running and writing tests
DLP Benchmarking -- Google Benchmark framework, YAML configs, performance analysis

Developer Guides

JIT Code Generation -- Just-In-Time compilation system, Xbyak assembler, kernel debugging

Reference

FAQ -- Common questions about threading, linking, data types, and performance
API Reference (Sphinx) -- Full generated API documentation

Project Links

README -- Feature summary and data type table
BUILD.md -- Build configuration and CMake options
INSTALL.md -- Installation steps
Contributing -- How to contribute
License -- BSD 3-Clause

Home | Quick Start | API Reference | Report Issue | Source Code

AOCL-DLP Wiki

Getting Started

User Guides

Performance & Config

Testing & Benchmarking

Developer Guides

JIT Code Generation

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

AOCL-DLP Documentation Hub

Getting Started

User Guides

Performance & Configuration

Testing & Benchmarking

Developer Guides

Reference

Project Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally