-
Notifications
You must be signed in to change notification settings - Fork 34
Description
For ISA dispatching, we instantiate the distance implementations (L2Impl, IPImpl, CosineSimilarityImpl). Since there are template parameters N, Ea, Eb, AVX_AVAILABILITY, corresponding to dimensionality, element type A, element type B, and AVX availability flag, respectively, many combinations of explicit template values are required. Crucially, also one for N, the fixed-/dynamic-dimensionality support.
It is not the best place to create the AVX-specific instantiations, because only the actual compute ops benefit from ISA-optimization.
The current implementations creates a lot of code
- at the end of the distance headers (
cosine.h,euclidean.h,inner_product.h) to define the symbols asextern; and - to produce the instantiations
multi-arch/avx2.cppandmulti-arch/avx512.cpp;
all of which makes heavy use of preprocessor macros and is therefore hard to understand and debug.
Completion of #183 should allow to lower the explicit instantiations to the various compute ops (IPFloatOp, IPVNNIOp, L2FloatOp, etc.), for which we already use fixed values of N, eliminating combinatorial complexity in the instantiations.
The task is to understand if is after completion of #183 such a simplification is possible, and, if so, implement it.
AS A maintainer of SVS
I WANT TO lower the instantiation of architecture-specific functions to the actual compute operations
SO THAT I can reduce maintenance and future development cost, as well as have a more optimal solution in general.