Skip to content

Add QuickGELU (lookup-table based) #561

Open
@PallHaraldsson

Description

@PallHaraldsson

Motivation and description

See here:
https://github.com/ggerganov/ggml/pull/254/files

I think we may need QuickGELU, for compatibility, if not same as GELU, more than just optimization.

It's probably just an optimization, it's a an approximation, but then why have both definitions there?

https://zeta.apac.ai/en/latest/zeta/nn/modules/quickgeluactivation/

The QuickGELUActivation class is a part of the Neural Network(NN) module that applies a Gaussian Error Linear Unit (GELU) approximation. [..] The approximate version of GELU used in this class is fast although somewhat less accurate than the standard GELU activation. [..]
"""Applies GELU approximation that is fast but somewhat inaccurate. See: https://github.com/hendrycks/GELUs"""

ggml-org/ggml#253

I'm implementing CLIP in GGML, and it turns out that we need the Quick GELU activation instead of GELU.

Also used with:
https://github.com/facebookresearch/MetaCLIP

MetaCLIP is trained w/ face blurred images.
@inproceedings{xu2023metaclip,
title={Demystifying CLIP Data}

They have two 128KB tables each for Float16 (but no table for ggml_gelu_quick_f32).

I thought lookup-tables went out of favor (for CPUs and GPUs), since faster to compute, but since not, most likely faster, at least in this case. I really don't think they would do this unless it really helped (I believe that's the most optimized and used library), at least for CPUs. So maybe consider also for other activation functions?

I'm not sure, probably lookup tables do not make sense on GPUs, since latency not as big of a deal, and threading compensates. I think the code there may only apply to CPUs. Can anyone confirm, or if also for GPUs?

Would it make sense to have a table for 8-bit floats too? And maybe to use it or some small table for Float16 with some extra computation?

I think I could implement this (in same way as there), i.e. the activations (so a starting point, not all of their use).

I also see there: "initialize GELU, Quick GELU, SILU and EXP F32 tables" I didn't think FP32 tables(!?) used, or for EXP, and also see unrelated GGML_OP_SILU_BACK and GGML_OP_ALIBI.

And FYI the 2016 GELU paper is updated in 2023 for some reason:

https://arxiv.org/abs/1606.08415
[v1] Mon, 27 Jun 2016 19:20:40 UTC (435 KB) [..]
[v3] Sun, 11 Nov 2018 07:40:32 UTC (3,013 KB) [..]
[v5] Tue, 6 Jun 2023 01:53:32 UTC (3,016 KB)

Possible Implementation

inline static float ggml_gelu_quick_f32(float x) {
    return x*(1.0f/(1.0f+expf(GELU_QUICK_COEF*x)));
}

is:

@inline gelu_quick(x) = x*(one(x)/(one(x)+exp(-1.702f*x)))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions