|
/** |
|
* Calculates the L2 squared distance between a float query vector and a binary document vector using ADC (Asymmetric Distance Computation). |
|
* This method implements a specialized version of L2 distance calculation where one vector is in binary format (compressed) |
|
* and the other is in float format (uncompressed). |
|
* |
|
* TODO: For now this will be very inefficient. We can in the future optimize through the following: |
|
* Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(), |
|
* Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class. |
|
* |
|
* @param queryVector The uncompressed query vector in float format |
|
* @param inputVector The compressed document vector in binary format, where each bit represents a dimension |
|
* @return The L2 squared distance between the two vectors. Lower values indicate closer vectors. |
|
* @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8) |
|
*/ |
|
public static float l2SquaredADC(float[] queryVector, byte[] inputVector) { |
|
// we cannot defer to VectorUtil as it does not support ADC. |
|
// TODO: add batching logic similar to C++ to improve performance. |
|
float score = 0; |
|
|
|
for (int i = 0; i < queryVector.length; ++i) { |
|
int byteIndex = i / 8; |
|
int bitOffset = 7 - (i % 8); |
|
int bitValue = (inputVector[byteIndex] >> bitOffset) & 1; |
|
|
|
// Calculate squared difference |
|
float diff = bitValue - queryVector[i]; |
|
score += diff * diff; |
|
} |
|
return score; |
|
} |
|
|
|
/** |
|
* Calculates the inner product similarity between a float query vector and a binary document vector using ADC |
|
* (Asymmetric Distance Computation). This method is useful for similarity searches where one vector is compressed |
|
* in binary format and the other remains in float format. |
|
* |
|
* The inner product is calculated by summing the products of corresponding elements, where the binary vector's |
|
* elements are interpreted as 0 or 1. |
|
* |
|
* TODO: For now this will be very inefficient. We can in the future optimize through the following: |
|
* Use FloatVector.SPECIES_PREFERRED for SIMD processing in chunks with reduceLanes(), |
|
* Configure build.gradle with --add-modules jdk.incubator.vector --enable-preview flags into the VectorUtils class. |
|
* |
|
* @param queryVector The uncompressed query vector in float format |
|
* @param inputVector The compressed document vector in binary format, where each bit represents a dimension |
|
* @return The inner product similarity score between the two vectors. Higher values indicate more similar vectors. |
|
* @throws IllegalArgumentException if queryVector length is not compatible with inputVector length (queryVector.length != inputVector.length * 8) |
|
*/ |
|
public static float innerProductADC(float[] queryVector, byte[] inputVector) { |
|
float score = 0; |
|
|
|
for (int i = 0; i < queryVector.length; ++i) { |
|
// Extract the bit for this dimension |
|
int byteIndex = i / 8; |
|
int bitOffset = 7 - (i % 8); |
|
int bitValue = (inputVector[byteIndex] >> bitOffset) & 1; |
|
|
|
// Calculate product and accumulate |
|
score += bitValue * queryVector[i]; |
|
} |
|
return score; |
|
} |
Description
While reading through the code of KNN what I found was ADC feature is not using SIMD instruction set for doing distance computation. Creating this gh issue to track the SIMD implementation for ADC. I just see bunch of TODOs in the code
Code Ref which needs a change:
k-NN/src/main/java/org/opensearch/knn/plugin/script/KNNScoringUtil.java
Lines 114 to 175 in 0d05a10