Faster `softmax`? by mcabbott · Pull Request #459 · FluxML/NNlib.jl

mcabbott · 2023-01-08T16:20:27Z

This defines a fast_softmax which uses a low-accuracy fast_exp. It's about 5x faster on CPU.

On a GPU, the low-accuracy exp isn't faster at all. For small arrays, fast_softmax is faster, because it skips the all(isfinite, max_) check & thus avoids synchronisation. Thus FluxML/NNlibCUDA.jl#63 should get all the benefit.

The alternative on CPU is to make an Array specialisation using LoopVectorization. That's not as quick as this fast_exp (about 2x slower for me) but several more digits of precision. This fast_exp is roughly Float16 precision, do we want that?

ToucheSir · 2023-01-08T19:50:19Z

It looks like there is a whole subset of literature for fast softmax approximations. I only read through https://arxiv.org/abs/2111.10770v1, but it has a nice list of prior art. Also of interest may be existing CPU-optimized softmax impls like oneDNN.

mcabbott · 2023-01-08T20:07:49Z

Had not looked, but not surprised there's a literature by now! IIRC we dropped the NVidia one as it was slower than NNlib's.

The immediate goal though is to make this part small compared to the matmul & permutations. Going by my times here this gets us from roughly 50% to 10%.

add fast_softmax via low-precision fast_exp

c380866

mcabbott closed this Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster `softmax`?#459

Faster `softmax`?#459
mcabbott wants to merge 1 commit intoFluxML:masterfrom
mcabbott:softmax2

mcabbott commented Jan 8, 2023

Uh oh!

ToucheSir commented Jan 8, 2023

Uh oh!

mcabbott commented Jan 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mcabbott commented Jan 8, 2023

Uh oh!

ToucheSir commented Jan 8, 2023

Uh oh!

mcabbott commented Jan 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants