Skip to content

Eltwise Guide

Abhiram S edited this page Mar 9, 2026 · 3 revisions

Eltwise Operations Guide

AOCL-DLP provides standalone element-wise operations that apply transformations to a matrix without performing a GEMM. These are different from GEMM post-ops -- use eltwise ops when you already have a computed matrix and want to apply activations, type conversions, or other element-wise transforms to it.

When to Use Eltwise vs GEMM Post-Ops

Scenario Use
Apply activation after GEMM (same call) GEMM with dlp_metadata_t post-ops -- see Post-Ops Guide
Apply activation to a matrix that was not produced by GEMM Standalone eltwise ops (this page)
Convert matrix between data types with fused operations Standalone eltwise ops (this page)

Function Signature

All eltwise functions share the same parameter pattern:

aocl_gemm_eltwise_ops_<input_type>o<output_type>(
    const char      order,     // 'R' = row-major, 'C' = column-major
    const char      transa,    // transpose option for input
    const char      transb,    // transpose option for output
    const md_t      m,         // number of rows
    const md_t      n,         // number of columns
    const <in_t>*   a,         // input matrix
    const md_t      lda,       // leading dimension of input
    <out_t>*        b,         // output matrix
    const md_t      ldb,       // leading dimension of output
    dlp_metadata_t* metadata   // post-operations to apply
);

The metadata parameter controls which operations are applied. Configure it exactly as described in the Post-Ops Guide -- the same dlp_metadata_t struct, seq_vector, and post-op types work here.

Supported Type Combinations

Input Output Function
bfloat16 float aocl_gemm_eltwise_ops_bf16of32
bfloat16 bfloat16 aocl_gemm_eltwise_ops_bf16obf16
float float aocl_gemm_eltwise_ops_f32of32
float bfloat16 aocl_gemm_eltwise_ops_f32obf16
float int32_t aocl_gemm_eltwise_ops_f32os32
float int8_t aocl_gemm_eltwise_ops_f32os8
float uint8_t aocl_gemm_eltwise_ops_f32ou8

Example: Apply GELU to a Float Matrix

#include <aocl_dlp.h>

// Input matrix (m x n, row-major)
float input[M * N]  = { /* ... */ };
float output[M * N] = {0};

// Configure GELU post-op
dlp_post_op_eltwise gelu_op = {
    .sf   = NULL,
    .algo = {
        .alpha     = NULL,
        .beta      = NULL,
        .algo_type = GELU_TANH,
        .stor_type = DLP_F32
    }
};
DLP_POST_OP_TYPE seq[] = { ELTWISE };

dlp_metadata_t meta = {0};
meta.seq_length  = 1;
meta.seq_vector  = seq;
meta.eltwise     = &gelu_op;
meta.num_eltwise = 1;

aocl_gemm_eltwise_ops_f32of32(
    'R', 'N', 'N', m, n,
    input, n,
    output, n,
    &meta);
// output[i][j] = GELU(input[i][j])

Example: Convert BF16 to F32 with RELU

bfloat16 bf16_data[M * N] = { /* ... */ };
float    f32_output[M * N] = {0};

// RELU post-op
dlp_post_op_eltwise relu_op = {
    .sf   = NULL,
    .algo = { .alpha = NULL, .beta = NULL, .algo_type = RELU, .stor_type = DLP_F32 }
};
DLP_POST_OP_TYPE seq[] = { ELTWISE };

dlp_metadata_t meta = {0};
meta.seq_length  = 1;
meta.seq_vector  = seq;
meta.eltwise     = &relu_op;
meta.num_eltwise = 1;

aocl_gemm_eltwise_ops_bf16of32(
    'R', 'N', 'N', m, n,
    bf16_data, n,
    f32_output, n,
    &meta);
// f32_output[i][j] = RELU( bf16_to_f32(bf16_data[i][j]) )

See Also

Clone this wiki locally