Skip to content
Nallani Bhaskar edited this page Jun 16, 2026 · 11 revisions

AOCL-DLP Documentation Hub

AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) is a high-performance library providing optimized deep learning primitives for AMD processors. It implements GEMM operations for machine learning applications, supporting multiple data types, fused pre/post-operations, and batch processing -- all tuned to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, AVX512_BF16, and AVX512_FP16 instruction sets.

New here? Start with the Quick Start Guide to build, install, and run your first GEMM in 5 minutes.


Getting Started

User Guides

  • Library Overview -- Architecture, components, data types, hardware abstraction
  • GEMM Guide -- Data type combinations, memory layouts, matrix reordering, choosing the right variant
  • Batch GEMM Guide -- Grouped batch interface, availability matrix, reordered B and post-ops in batch mode
  • Post-Operations Guide -- Fused post-ops (BIAS, activations, SCALE, MATRIX_ADD/MUL) via dlp_metadata_t
  • Eltwise Operations Guide -- Standalone element-wise operations (separate from GEMM post-ops)
  • Quantization Guide -- Symmetric quantization, mixed-precision workflows, scale/zero-point setup
  • API Lifecycle -- End-to-end flow: data prep, post-ops setup, compute, threading

Performance & Configuration

  • Performance Guide -- Threading, NUMA, memory layout, architecture-specific tips
  • Environment Variables -- Complete reference for DLP_NUM_THREADS, AOCL_DLP_ENABLE_INSTRUCTIONS, OpenMP tuning

Testing & Benchmarking

  • DLP Testing -- Google Test framework, YAML configs, running and writing tests
  • DLP Benchmarking -- Google Benchmark framework, YAML configs, performance analysis

Developer Guides

Reference

  • FAQ -- Common questions about threading, linking, data types, and performance
  • API Reference (Sphinx) -- Full generated API documentation

Project Links

Clone this wiki locally