Library Overview

High-level overview of AOCL-DLP architecture, components, and design goals.

Components

GEMM kernels and drivers
Post-operations framework (metadata-driven)
Element-wise utilities
Threading and parallelization controls

Data Types

float32, bfloat16, int8/uint8, int32; mixed-precision flows
Types: Types API Reference

BFloat16 API Behavior

AOCL-DLP automatically handles BF16 operations on hardware lacking native AVX512_BF16 ISA support by transparently rerouting to F32 implementations.

Hardware Support:

Native BF16: Intel Cooper Lake/Sapphire Rapids+, AMD Zen4+ (uses AVX512_BF16 instructions)
F32 Fallback: Automatically activated on:
- AVX2 machines (uses AVX2 F32 kernels)
- AVX512 without BF16 support: Intel Skylake, Cascade Lake, Ice Lake (uses AVX512 F32 kernels)

Key Points:

BF16 API calls work unchanged across all hardware
Library performs runtime detection and automatic rerouting
When fallback is active: BF16→F32 conversion, F32 computation, F32→BF16 conversion (if needed)
Performance impact on fallback: conversion overhead, 2x memory bandwidth usage

Call Layers

Prepare data (layouts, leading dimensions)
Optional reordering for repeated use
Configure dlp_metadata_t for fused post-ops
Call GEMM or eltwise
Optional de/reordering for outputs

Hardware Features

Targets AVX2/FMA3, AVX512, AVX512_VNNI, AVX512_BF16 on supported AMD CPUs.

Home | Quick Start | API Reference | Report Issue | Source Code

AOCL-DLP Wiki

Getting Started

User Guides

Performance & Config

Testing & Benchmarking

Developer Guides

JIT Code Generation

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Library Overview

Library Overview

Components

Data Types

BFloat16 API Behavior

Call Layers

Hardware Features

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally