Jit

JIT - Just-In-Time Code Generation (Developer Guide)

AOCL-DLP uses Just In Time (JIT) compilation to generate optimized code for specific matrix sizes and data types at runtime. This approach allows the library to produce highly efficient implementations tailored to the exact parameters of the GEMM operations being performed.

How JIT Works

Kernel Generation: When a specific operation is requested, AOCL-DLP analyzes the parameters and generates a tailored kernel optimized for those parameters.
Caching: Once a kernel is generated, it can be cached for future use, reducing the overhead of recompilation.
Dynamic Optimization: The JIT compiler can apply various optimization techniques based on the current execution context, such as loop unrolling, vectorization, and more.

Benefits of JIT Compilation

Performance: By generating optimized code on the fly, JIT compilation can significantly improve the performance of operations, especially for non-standard configurations.
Flexibility: JIT allows for greater flexibility in supporting a wide range of hardware and software configurations without the need for extensive pre-compilation.
Reduced Latency: For workloads with varying parameters, JIT can reduce latency by avoiding the need to recompile code for each unique configuration.

Xbyak JIT Assembler

AOCL-DLP leverages the Xbyak JIT assembler (currently v7.36.1) to generate optimized assembly code on the fly. Xbyak provides a high-level C++ interface for writing JIT-compiled code, allowing developers to focus on the algorithm rather than the intricacies of assembly language.

How JIT Works in AOCL-DLP

Parameter Specification: When a GEMM operation is requested, the user specifies the matrix dimensions, data types, and any additional parameters (e.g., post-operations).
Code Generation: AOCL-DLP uses Xbyak to generate assembly code optimized for the specified parameters. This code is tailored to leverage the specific capabilities of the underlying hardware (e.g., AVX2, AVX512).
Compilation: The generated assembly code is compiled into machine code at runtime.
Execution: The compiled code is executed to perform the GEMM operation, providing high performance for the specific use case.
Caching: To avoid the overhead of regenerating code for the same parameters, AOCL-DLP caches the generated code for reuse in future operations with identical parameters.

JIT pack-B kernels

In addition to the GEMM micro-kernels, F32 GEMM also uses JIT-generated pack-B (B-matrix packing) kernels, available for both AVX-512 and AVX2 paths. These complement the existing micro-kernel JIT so that the data-reordering step is also tailored to the runtime parameters and target ISA.

Frame pointer support

JIT-generated kernels maintain a frame pointer (RBP). This produces unwindable stacks for the runtime-generated code, so profilers such as perf can attribute samples and reconstruct correct call stacks through JIT kernels — useful when debugging or profiling the generated code described below.

How to dump JIT generated code

To dump the JIT generated code for inspection or debugging purposes:

Method 1: Build flag

cmake -DCMAKE_CXX_FLAGS="-DDLP_DUMP_JIT_CODE" ...

Method 2: Source modification Add #define DLP_DUMP_JIT_CODE at the top of src/jit/amdzen/amdzen_generator.cc before building.

Output files

Dumped files are created in the current working directory with names like:

jit_kernel_16x64.bin (GEMM kernel for MR=16, NR=64)
jit_gemv_n1_kernel_16x5.bin (GEMV N=1, MR=16, config index 5)
jit_gemv_m1_kernel_32x2.bin (GEMV M=1, NR=32, config index 2)

To disassemble:

objdump -D -b binary -m i386:x86-64 jit_kernel_16x64.bin

Home | Quick Start | API Reference | Report Issue | Source Code

AOCL-DLP Wiki

Getting Started

User Guides

Performance & Config

Testing & Benchmarking

Developer Guides

JIT Code Generation

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jit

JIT - Just-In-Time Code Generation (Developer Guide)

How JIT Works

Benefits of JIT Compilation

Xbyak JIT Assembler

How JIT Works in AOCL-DLP

JIT pack-B kernels

Frame pointer support

How to dump JIT generated code

Output files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally