L2Sqr NEON, unrolled loop, prefetching #86

xbasel · 2025-04-05T01:32:41Z

DRAFT - no ready

L2Sqr SIMD version for ARM Neon + unrolled loop + prefetching

graviton 3:

  Scalar time:     0.771 sec
  NEON time:       0.125 sec
  Speedup:          6.19x

Please note that there's relative error in scalar vs simd most likely due to floating point rounding and summation order, fp arithmetic is not really associative
this was not compiled with -ffast-math (not sure how and if this impacts the result)

Please note that there's already SIMD impl in https://github.com/valkey-io/valkey-search/blob/main/third_party/simsimd/include/simsimd/spatial.h , but I believe this impl will outperform it as has unrolled loop and it prefetches the memory

=====
side note, please note that the scalar impl is already PARTIALLY simded, as I can see this in the generate bytecode:

ldr     q16, [x0, x2]
ldr     q5, [x1, x2]
fsub    v5.4s, v16.4s, v5.4s
fmul    v5.4s, v5.4s, v5.4s

however, the summation, is not simded (I see fadd, which is scalar)

Benchmark Results (1M elements): Scalar time: 0.771 sec NEON time: 0.125 sec Speedup: 6.19x

yairgott · 2025-05-05T21:39:19Z

Can you provide benchmark numbers which show the change benefits using VectorDBBench?

We have been using a forked version of VectorDBBench. As a client, please use memorydb and for --case-type, please use Performance768D10M or Performance768D1M.

L2Sqr NEON, unrolled loop, + prefetching

8aa1219

Benchmark Results (1M elements): Scalar time: 0.771 sec NEON time: 0.125 sec Speedup: 6.19x

xbasel changed the title ~~L2Sqr NEON, unrolled loop, + prefetching~~ L2Sqr NEON, unrolled loop, prefetching Apr 5, 2025

xbasel marked this pull request as draft April 5, 2025 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

L2Sqr NEON, unrolled loop, prefetching #86

L2Sqr NEON, unrolled loop, prefetching #86

Uh oh!

xbasel commented Apr 5, 2025 •

edited

Loading

Uh oh!

yairgott commented May 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

L2Sqr NEON, unrolled loop, prefetching #86

Are you sure you want to change the base?

L2Sqr NEON, unrolled loop, prefetching #86

Uh oh!

Conversation

xbasel commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yairgott commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xbasel commented Apr 5, 2025 •

edited

Loading

yairgott commented May 5, 2025 •

edited

Loading