-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES|QL dense vector field type support #126456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
carlosdelest
merged 56 commits into
elastic:main
from
carlosdelest:feature/esql_dense_vector_support
May 27, 2025
Merged
Changes from 46 commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
6439422
Register dense vector field type
carlosdelest c7e48a0
Add first version of BlockDocValuesReader for dense_vector
carlosdelest 9e40cd4
Fixed BlockDocValuesReader to work with all dense_vector types
carlosdelest 03f6a92
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest d84c8dd
Small test fixes
carlosdelest 9ff5ed8
Fix casting
carlosdelest 2754697
Improve testing to add random indexing, and fix similarity
carlosdelest d196be0
Add index = false support
carlosdelest 975b4db
Spotless
carlosdelest ae9fa5f
Synthetic source testing
carlosdelest 9896adf
Add CSV tests and necessary infra for dense_vector field type
carlosdelest dfa420e
Make CSV test loader to use numbers when there are multivalued numeri…
carlosdelest d983495
Fixed test mapping to avoid parsing errors when numbers are used for …
carlosdelest 5707e25
Avoid non-float dense vector field types for now
carlosdelest a8c8a6a
[CI] Auto commit changes from spotless
elasticsearchmachine d827c6d
Fix test error when checking block loaders
carlosdelest d8e139d
Fix value generation for dense vectors
carlosdelest 0539aff
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest 7bbe7ee
Fix testing for dense vectors and block loaders
carlosdelest 7d1f8b7
Remove dense_vector from CaseTests
carlosdelest 8870fef
Provide support in tests for creating constant blocks from collection…
carlosdelest ad84f13
Fix test error
carlosdelest 2619fe6
Fix unsupported types test
carlosdelest b23d5a9
Support synthetic source with non indexed vectors
carlosdelest 6bb8e62
Remove unneeded code
carlosdelest 8013e47
Add support for dense_vector in PositionToXContent and EsqlQueryRespo…
carlosdelest e0e34f4
Fix unsupported types test
carlosdelest c30f5d9
Fix unsupported types test
carlosdelest a2fbb13
Fix block loader builder
carlosdelest 4833ce5
Ensure ordering when creating double blocks
carlosdelest e8878a0
Changed randomDouble() for randomFloat() to generate test data
carlosdelest 93d45fc
Additional CSV tests
carlosdelest a6b0f6c
Fix class casting
carlosdelest cd462b8
Use Float as block type for dense_vector
carlosdelest 0675a42
Fix tests
carlosdelest 7f95d6a
[CI] Auto commit changes from spotless
elasticsearchmachine 2cff4e7
Fix test
carlosdelest efc3fb6
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest f8741ca
Floats are not ordered for dense_vector values
carlosdelest c951ee7
Use dense vector specific methods for creating builders, and check di…
carlosdelest ba0a6b9
Avoid doing checks on dense vectors as we can't cast to BlockBuilder
carlosdelest f5889b2
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest 4b2126e
Add CSV tests with unordered data, and fix CSV data retrieval for tha…
carlosdelest afe79f7
Change source readers name() method
carlosdelest 8049a58
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest 217ce84
[CI] Auto commit changes from spotless
elasticsearchmachine 74fe676
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest af965a1
Spotless
carlosdelest 53ec29c
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest 60da0fe
Fix tests
carlosdelest 06cbea9
Fix test suggested cast for joins
carlosdelest 8f07e08
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest 53bcbc7
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest fe82007
[CI] Auto commit changes from spotless
elasticsearchmachine a63ca2c
Fix LookupJoinTypesIT
carlosdelest c5d6ddc
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,13 +11,16 @@ | |
|
||
import org.apache.lucene.index.BinaryDocValues; | ||
import org.apache.lucene.index.DocValues; | ||
import org.apache.lucene.index.FloatVectorValues; | ||
import org.apache.lucene.index.KnnVectorValues; | ||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.index.NumericDocValues; | ||
import org.apache.lucene.index.SortedDocValues; | ||
import org.apache.lucene.index.SortedNumericDocValues; | ||
import org.apache.lucene.index.SortedSetDocValues; | ||
import org.apache.lucene.util.BytesRef; | ||
import org.elasticsearch.common.io.stream.ByteArrayStreamInput; | ||
import org.elasticsearch.index.IndexVersion; | ||
import org.elasticsearch.index.mapper.BlockLoader.BlockFactory; | ||
import org.elasticsearch.index.mapper.BlockLoader.BooleanBuilder; | ||
import org.elasticsearch.index.mapper.BlockLoader.Builder; | ||
|
@@ -26,6 +29,7 @@ | |
import org.elasticsearch.index.mapper.BlockLoader.DoubleBuilder; | ||
import org.elasticsearch.index.mapper.BlockLoader.IntBuilder; | ||
import org.elasticsearch.index.mapper.BlockLoader.LongBuilder; | ||
import org.elasticsearch.index.mapper.vectors.VectorEncoderDecoder; | ||
import org.elasticsearch.search.fetch.StoredFieldsSpec; | ||
|
||
import java.io.IOException; | ||
|
@@ -504,6 +508,87 @@ public String toString() { | |
} | ||
} | ||
|
||
public static class DenseVectorBlockLoader extends DocValuesBlockLoader { | ||
private final String fieldName; | ||
private final int dimensions; | ||
|
||
public DenseVectorBlockLoader(String fieldName, int dimensions) { | ||
this.fieldName = fieldName; | ||
this.dimensions = dimensions; | ||
} | ||
|
||
@Override | ||
public Builder builder(BlockFactory factory, int expectedCount) { | ||
return factory.denseVectors(expectedCount, dimensions); | ||
} | ||
|
||
@Override | ||
public AllReader reader(LeafReaderContext context) throws IOException { | ||
FloatVectorValues floatVectorValues = context.reader().getFloatVectorValues(fieldName); | ||
if (floatVectorValues != null) { | ||
return new DenseVectorValuesBlockReader(floatVectorValues, dimensions); | ||
} | ||
return new ConstantNullsReader(); | ||
} | ||
} | ||
|
||
private static class DenseVectorValuesBlockReader extends BlockDocValuesReader { | ||
private final FloatVectorValues floatVectorValues; | ||
private final KnnVectorValues.DocIndexIterator iterator; | ||
private final int dimensions; | ||
|
||
DenseVectorValuesBlockReader(FloatVectorValues floatVectorValues, int dimensions) { | ||
this.floatVectorValues = floatVectorValues; | ||
iterator = floatVectorValues.iterator(); | ||
this.dimensions = dimensions; | ||
} | ||
|
||
@Override | ||
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException { | ||
// Doubles from doc values ensures that the values are in order | ||
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) { | ||
for (int i = 0; i < docs.count(); i++) { | ||
int doc = docs.get(i); | ||
if (doc < iterator.docID()) { | ||
throw new IllegalStateException("docs within same block must be in order"); | ||
} | ||
read(doc, builder); | ||
} | ||
return builder.build(); | ||
} | ||
} | ||
|
||
@Override | ||
public void read(int docId, BlockLoader.StoredFields storedFields, Builder builder) throws IOException { | ||
read(docId, (BlockLoader.FloatBuilder) builder); | ||
} | ||
|
||
private void read(int doc, BlockLoader.FloatBuilder builder) throws IOException { | ||
if (iterator.advance(doc) == doc) { | ||
builder.beginPositionEntry(); | ||
float[] floats = floatVectorValues.vectorValue(iterator.index()); | ||
assert floats.length == dimensions | ||
: "unexpected dimensions for vector value; expected " + dimensions + " but got " + floats.length; | ||
for (float aFloat : floats) { | ||
builder.appendFloat(aFloat); | ||
} | ||
builder.endPositionEntry(); | ||
} else { | ||
builder.appendNull(); | ||
} | ||
} | ||
|
||
@Override | ||
public int docId() { | ||
return iterator.docID(); | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return "BlockDocValuesReader.FloatVectorValuesBlockReader"; | ||
} | ||
} | ||
|
||
public static class BytesRefsFromOrdsBlockLoader extends DocValuesBlockLoader { | ||
private final String fieldName; | ||
|
||
|
@@ -752,6 +837,94 @@ public String toString() { | |
} | ||
} | ||
|
||
public static class DenseVectorFromBinaryBlockLoader extends DocValuesBlockLoader { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one reads from binary docvalues |
||
private final String fieldName; | ||
private final int dims; | ||
private final IndexVersion indexVersion; | ||
|
||
public DenseVectorFromBinaryBlockLoader(String fieldName, int dims, IndexVersion indexVersion) { | ||
this.fieldName = fieldName; | ||
this.dims = dims; | ||
this.indexVersion = indexVersion; | ||
} | ||
|
||
@Override | ||
public Builder builder(BlockFactory factory, int expectedCount) { | ||
return factory.denseVectors(expectedCount, dims); | ||
} | ||
|
||
@Override | ||
public AllReader reader(LeafReaderContext context) throws IOException { | ||
BinaryDocValues docValues = context.reader().getBinaryDocValues(fieldName); | ||
if (docValues == null) { | ||
return new ConstantNullsReader(); | ||
} | ||
return new DenseVectorFromBinary(docValues, dims, indexVersion); | ||
} | ||
} | ||
|
||
private static class DenseVectorFromBinary extends BlockDocValuesReader { | ||
private final BinaryDocValues docValues; | ||
private final IndexVersion indexVersion; | ||
private final int dimensions; | ||
private final float[] scratch; | ||
|
||
private int docID = -1; | ||
|
||
DenseVectorFromBinary(BinaryDocValues docValues, int dims, IndexVersion indexVersion) { | ||
this.docValues = docValues; | ||
this.scratch = new float[dims]; | ||
this.indexVersion = indexVersion; | ||
this.dimensions = dims; | ||
} | ||
|
||
@Override | ||
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException { | ||
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) { | ||
for (int i = 0; i < docs.count(); i++) { | ||
int doc = docs.get(i); | ||
if (doc < docID) { | ||
throw new IllegalStateException("docs within same block must be in order"); | ||
} | ||
read(doc, builder); | ||
} | ||
return builder.build(); | ||
} | ||
} | ||
|
||
@Override | ||
public void read(int docId, BlockLoader.StoredFields storedFields, Builder builder) throws IOException { | ||
read(docId, (BlockLoader.FloatBuilder) builder); | ||
} | ||
|
||
private void read(int doc, BlockLoader.FloatBuilder builder) throws IOException { | ||
this.docID = doc; | ||
if (false == docValues.advanceExact(doc)) { | ||
builder.appendNull(); | ||
return; | ||
} | ||
BytesRef bytesRef = docValues.binaryValue(); | ||
assert bytesRef.length > 0; | ||
VectorEncoderDecoder.decodeDenseVector(indexVersion, bytesRef, scratch); | ||
|
||
builder.beginPositionEntry(); | ||
for (float value : scratch) { | ||
builder.appendFloat(value); | ||
} | ||
builder.endPositionEntry(); | ||
} | ||
|
||
@Override | ||
public int docId() { | ||
return docID; | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return "DenseVectorFromBinary.Bytes"; | ||
} | ||
} | ||
|
||
public static class BooleansBlockLoader extends DocValuesBlockLoader { | ||
private final String fieldName; | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a BlockLoader for dense vectors, that uses
FloatVectorValues
to retrieve indexed vector data.