-
Notifications
You must be signed in to change notification settings - Fork 5
Library Overview
Vishal edited this page Feb 11, 2026
·
6 revisions
High-level overview of AOCL-DLP architecture, components, and design goals.
- GEMM kernels and drivers
- Post-operations framework (metadata-driven)
- Element-wise utilities
- Threading and parallelization controls
- float32, bfloat16, int8/uint8, int32; mixed-precision flows
- Types: Types API Reference
AOCL-DLP automatically handles BF16 operations on hardware lacking native AVX512_BF16 ISA support by transparently rerouting to F32 implementations.
Hardware Support:
- Native BF16: Intel Cooper Lake/Sapphire Rapids+, AMD Zen4+ (uses AVX512_BF16 instructions)
-
F32 Fallback: Automatically activated on:
- AVX2 machines (uses AVX2 F32 kernels)
- AVX512 without BF16 support: Intel Skylake, Cascade Lake, Ice Lake (uses AVX512 F32 kernels)
Key Points:
- BF16 API calls work unchanged across all hardware
- Library performs runtime detection and automatic rerouting
- When fallback is active: BF16→F32 conversion, F32 computation, F32→BF16 conversion (if needed)
- Performance impact on fallback: conversion overhead, 2x memory bandwidth usage
- Prepare data (layouts, leading dimensions)
- Optional reordering for repeated use
- Configure
dlp_metadata_tfor fused post-ops - Call GEMM or eltwise
- Optional de/reordering for outputs
Targets AVX2/FMA3, AVX512, AVX512_VNNI, AVX512_BF16 on supported AMD CPUs.
Getting Started
User Guides
- Library Overview
- GEMM Guide
- Batch GEMM Guide
- Post-Operations
- Eltwise Operations
- Quantization
- API Lifecycle
Performance & Config
Testing & Benchmarking
Developer Guides
Reference