Skip to content

ES|QL dense vector field type support #126456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
6439422
Register dense vector field type
carlosdelest Apr 7, 2025
c7e48a0
Add first version of BlockDocValuesReader for dense_vector
carlosdelest Apr 7, 2025
9e40cd4
Fixed BlockDocValuesReader to work with all dense_vector types
carlosdelest Apr 7, 2025
03f6a92
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 7, 2025
d84c8dd
Small test fixes
carlosdelest Apr 7, 2025
9ff5ed8
Fix casting
carlosdelest Apr 7, 2025
2754697
Improve testing to add random indexing, and fix similarity
carlosdelest Apr 7, 2025
d196be0
Add index = false support
carlosdelest Apr 8, 2025
975b4db
Spotless
carlosdelest Apr 8, 2025
ae9fa5f
Synthetic source testing
carlosdelest Apr 8, 2025
9896adf
Add CSV tests and necessary infra for dense_vector field type
carlosdelest Apr 8, 2025
dfa420e
Make CSV test loader to use numbers when there are multivalued numeri…
carlosdelest Apr 8, 2025
d983495
Fixed test mapping to avoid parsing errors when numbers are used for …
carlosdelest Apr 8, 2025
5707e25
Avoid non-float dense vector field types for now
carlosdelest Apr 8, 2025
a8c8a6a
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 8, 2025
d827c6d
Fix test error when checking block loaders
carlosdelest Apr 8, 2025
d8e139d
Fix value generation for dense vectors
carlosdelest Apr 8, 2025
0539aff
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest Apr 8, 2025
7bbe7ee
Fix testing for dense vectors and block loaders
carlosdelest Apr 8, 2025
7d1f8b7
Remove dense_vector from CaseTests
carlosdelest Apr 8, 2025
8870fef
Provide support in tests for creating constant blocks from collection…
carlosdelest Apr 8, 2025
ad84f13
Fix test error
carlosdelest Apr 9, 2025
2619fe6
Fix unsupported types test
carlosdelest Apr 9, 2025
b23d5a9
Support synthetic source with non indexed vectors
carlosdelest Apr 9, 2025
6bb8e62
Remove unneeded code
carlosdelest Apr 9, 2025
8013e47
Add support for dense_vector in PositionToXContent and EsqlQueryRespo…
carlosdelest Apr 9, 2025
e0e34f4
Fix unsupported types test
carlosdelest Apr 9, 2025
c30f5d9
Fix unsupported types test
carlosdelest Apr 10, 2025
a2fbb13
Fix block loader builder
carlosdelest Apr 10, 2025
4833ce5
Ensure ordering when creating double blocks
carlosdelest Apr 10, 2025
e8878a0
Changed randomDouble() for randomFloat() to generate test data
carlosdelest Apr 10, 2025
93d45fc
Additional CSV tests
carlosdelest Apr 11, 2025
a6b0f6c
Fix class casting
carlosdelest Apr 11, 2025
cd462b8
Use Float as block type for dense_vector
carlosdelest Apr 15, 2025
0675a42
Fix tests
carlosdelest Apr 15, 2025
7f95d6a
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 15, 2025
2cff4e7
Fix test
carlosdelest Apr 15, 2025
efc3fb6
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest Apr 15, 2025
f8741ca
Floats are not ordered for dense_vector values
carlosdelest Apr 21, 2025
c951ee7
Use dense vector specific methods for creating builders, and check di…
carlosdelest Apr 21, 2025
ba0a6b9
Avoid doing checks on dense vectors as we can't cast to BlockBuilder
carlosdelest Apr 21, 2025
f5889b2
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 21, 2025
4b2126e
Add CSV tests with unordered data, and fix CSV data retrieval for tha…
carlosdelest Apr 30, 2025
afe79f7
Change source readers name() method
carlosdelest Apr 30, 2025
8049a58
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest Apr 30, 2025
217ce84
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 30, 2025
74fe676
Merge remote-tracking branch 'origin/main' into feature/esql_dense_ve…
carlosdelest May 23, 2025
af965a1
Spotless
carlosdelest May 23, 2025
53ec29c
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest May 23, 2025
60da0fe
Fix tests
carlosdelest May 23, 2025
06cbea9
Fix test suggested cast for joins
carlosdelest May 26, 2025
8f07e08
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest May 26, 2025
53bcbc7
Merge branch 'main' into feature/esql_dense_vector_support
carlosdelest May 26, 2025
fe82007
[CI] Auto commit changes from spotless
elasticsearchmachine May 26, 2025
a63ca2c
Fix LookupJoinTypesIT
carlosdelest May 26, 2025
c5d6ddc
Merge remote-tracking branch 'carlosdelest/feature/esql_dense_vector_…
carlosdelest May 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,16 @@

import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.index.DocValues;
import org.apache.lucene.index.FloatVectorValues;
import org.apache.lucene.index.KnnVectorValues;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.NumericDocValues;
import org.apache.lucene.index.SortedDocValues;
import org.apache.lucene.index.SortedNumericDocValues;
import org.apache.lucene.index.SortedSetDocValues;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.io.stream.ByteArrayStreamInput;
import org.elasticsearch.index.IndexVersion;
import org.elasticsearch.index.mapper.BlockLoader.BlockFactory;
import org.elasticsearch.index.mapper.BlockLoader.BooleanBuilder;
import org.elasticsearch.index.mapper.BlockLoader.Builder;
Expand All @@ -26,6 +29,7 @@
import org.elasticsearch.index.mapper.BlockLoader.DoubleBuilder;
import org.elasticsearch.index.mapper.BlockLoader.IntBuilder;
import org.elasticsearch.index.mapper.BlockLoader.LongBuilder;
import org.elasticsearch.index.mapper.vectors.VectorEncoderDecoder;
import org.elasticsearch.search.fetch.StoredFieldsSpec;

import java.io.IOException;
Expand Down Expand Up @@ -504,6 +508,87 @@ public String toString() {
}
}

public static class DenseVectorBlockLoader extends DocValuesBlockLoader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a BlockLoader for dense vectors, that uses FloatVectorValues to retrieve indexed vector data.

private final String fieldName;
private final int dimensions;

public DenseVectorBlockLoader(String fieldName, int dimensions) {
this.fieldName = fieldName;
this.dimensions = dimensions;
}

@Override
public Builder builder(BlockFactory factory, int expectedCount) {
return factory.denseVectors(expectedCount, dimensions);
}

@Override
public AllReader reader(LeafReaderContext context) throws IOException {
FloatVectorValues floatVectorValues = context.reader().getFloatVectorValues(fieldName);
if (floatVectorValues != null) {
return new DenseVectorValuesBlockReader(floatVectorValues, dimensions);
}
return new ConstantNullsReader();
}
}

private static class DenseVectorValuesBlockReader extends BlockDocValuesReader {
private final FloatVectorValues floatVectorValues;
private final KnnVectorValues.DocIndexIterator iterator;
private final int dimensions;

DenseVectorValuesBlockReader(FloatVectorValues floatVectorValues, int dimensions) {
this.floatVectorValues = floatVectorValues;
iterator = floatVectorValues.iterator();
this.dimensions = dimensions;
}

@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
// Doubles from doc values ensures that the values are in order
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) {
for (int i = 0; i < docs.count(); i++) {
int doc = docs.get(i);
if (doc < iterator.docID()) {
throw new IllegalStateException("docs within same block must be in order");
}
read(doc, builder);
}
return builder.build();
}
}

@Override
public void read(int docId, BlockLoader.StoredFields storedFields, Builder builder) throws IOException {
read(docId, (BlockLoader.FloatBuilder) builder);
}

private void read(int doc, BlockLoader.FloatBuilder builder) throws IOException {
if (iterator.advance(doc) == doc) {
builder.beginPositionEntry();
float[] floats = floatVectorValues.vectorValue(iterator.index());
assert floats.length == dimensions
: "unexpected dimensions for vector value; expected " + dimensions + " but got " + floats.length;
for (float aFloat : floats) {
builder.appendFloat(aFloat);
}
builder.endPositionEntry();
} else {
builder.appendNull();
}
}

@Override
public int docId() {
return iterator.docID();
}

@Override
public String toString() {
return "BlockDocValuesReader.FloatVectorValuesBlockReader";
}
}

public static class BytesRefsFromOrdsBlockLoader extends DocValuesBlockLoader {
private final String fieldName;

Expand Down Expand Up @@ -752,6 +837,94 @@ public String toString() {
}
}

public static class DenseVectorFromBinaryBlockLoader extends DocValuesBlockLoader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one reads from binary docvalues

private final String fieldName;
private final int dims;
private final IndexVersion indexVersion;

public DenseVectorFromBinaryBlockLoader(String fieldName, int dims, IndexVersion indexVersion) {
this.fieldName = fieldName;
this.dims = dims;
this.indexVersion = indexVersion;
}

@Override
public Builder builder(BlockFactory factory, int expectedCount) {
return factory.denseVectors(expectedCount, dims);
}

@Override
public AllReader reader(LeafReaderContext context) throws IOException {
BinaryDocValues docValues = context.reader().getBinaryDocValues(fieldName);
if (docValues == null) {
return new ConstantNullsReader();
}
return new DenseVectorFromBinary(docValues, dims, indexVersion);
}
}

private static class DenseVectorFromBinary extends BlockDocValuesReader {
private final BinaryDocValues docValues;
private final IndexVersion indexVersion;
private final int dimensions;
private final float[] scratch;

private int docID = -1;

DenseVectorFromBinary(BinaryDocValues docValues, int dims, IndexVersion indexVersion) {
this.docValues = docValues;
this.scratch = new float[dims];
this.indexVersion = indexVersion;
this.dimensions = dims;
}

@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs) throws IOException {
try (BlockLoader.FloatBuilder builder = factory.denseVectors(docs.count(), dimensions)) {
for (int i = 0; i < docs.count(); i++) {
int doc = docs.get(i);
if (doc < docID) {
throw new IllegalStateException("docs within same block must be in order");
}
read(doc, builder);
}
return builder.build();
}
}

@Override
public void read(int docId, BlockLoader.StoredFields storedFields, Builder builder) throws IOException {
read(docId, (BlockLoader.FloatBuilder) builder);
}

private void read(int doc, BlockLoader.FloatBuilder builder) throws IOException {
this.docID = doc;
if (false == docValues.advanceExact(doc)) {
builder.appendNull();
return;
}
BytesRef bytesRef = docValues.binaryValue();
assert bytesRef.length > 0;
VectorEncoderDecoder.decodeDenseVector(indexVersion, bytesRef, scratch);

builder.beginPositionEntry();
for (float value : scratch) {
builder.appendFloat(value);
}
builder.endPositionEntry();
}

@Override
public int docId() {
return docID;
}

@Override
public String toString() {
return "DenseVectorFromBinary.Bytes";
}
}

public static class BooleansBlockLoader extends DocValuesBlockLoader {
private final String fieldName;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,11 @@ interface BlockFactory {
*/
DoubleBuilder doubles(int expectedCount);

/**
* Build a builder to load dense vectors without any loading constraints.
*/
FloatBuilder denseVectors(int expectedVectorsCount, int dimensions);

/**
* Build a builder to load ints as loaded from doc values.
* Doc values load ints in sorted order.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,49 @@ public String toString() {
}
}

/**
* Load {@code float}s from {@code _source}.
*/
public static class DenseVectorBlockLoader extends SourceBlockLoader {
private final int dimensions;

public DenseVectorBlockLoader(ValueFetcher fetcher, LeafIteratorLookup lookup, int dimensions) {
super(fetcher, lookup);
this.dimensions = dimensions;
}

@Override
public Builder builder(BlockFactory factory, int expectedCount) {
return factory.denseVectors(expectedCount, dimensions);
}

@Override
public RowStrideReader rowStrideReader(LeafReaderContext context, DocIdSetIterator iter) {
return new DenseVectors(fetcher, iter);
}

@Override
protected String name() {
return "DenseVectors";
}
}

private static class DenseVectors extends BlockSourceReader {
DenseVectors(ValueFetcher fetcher, DocIdSetIterator iter) {
super(fetcher, iter);
}

@Override
protected void append(BlockLoader.Builder builder, Object v) {
((BlockLoader.FloatBuilder) builder).appendFloat(((Number) v).floatValue());
}

@Override
public String toString() {
return "BlockSourceReader.DenseVectors";
}
}

/**
* Load {@code int}s from {@code _source}.
*/
Expand Down
Loading