Skip to content

ES|QL dense vector field type support #126456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Apr 8, 2025

Support dense_vector field type. This is the first step to allow kNN queries and having dense_vector as a first class citizen in ES|QL

This allows a mapping that has dense_vector like the following:

{
  "properties": {
    "id": {
      "type": "long"
    },
    "vector": {
      "type": "dense_vector",
      "similarity": "l2_norm"
    }
  }
}

To be retrieved via ES|QL:

FROM dense_vector
| KEEP id, vector
| SORT id
id   | vector
0    | [1.0, 2.0, 3.0]
1    | [4.0, 5.0, 6.0]

For now, just float element types are allowed. There will be a similar work in order to allow for byte and bit element types, but I wanted to review this implementation first to ensure it's in line with what we need.

Both indexed / not indexed types and synthetic source is supported.

Support for CSV tests has been added. For now CSV tests are simple, we can expand on these and also support additional operations on dense_vector field types in subsequent PRs. An integration test has been added to test extensively on different index options and doc storage structure.

dense_vector field type is under a feature flag, as this will require follow up work.

@carlosdelest carlosdelest added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/Search Catch all for Search Relevance labels Apr 8, 2025
@@ -504,6 +506,80 @@ public String toString() {
}
}

public static class DenseVectorBlockLoader extends DocValuesBlockLoader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a BlockLoader for dense vectors, that uses FloatVectorValues to retrieve indexed vector data.

@Override
public BlockLoader blockLoader(MappedFieldType.BlockLoaderContext blContext) {
if (elementType != ElementType.FLOAT) {
throw new UnsupportedOperationException("Only float dense vectors are supported for now");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can work on this next, creating specific BlockLoaders for Byte and Bit field types.

@@ -145,6 +146,10 @@ private static void assertMetadata(
// Type.asType translates all bytes references into keywords
continue;
}
if (blockType == Type.DOUBLE && expectedType == DENSE_VECTOR) {
// DENSE_VECTOR is internally represented as a double block
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially change when we support byte and bit element types - we could create the appropriate blocks.

@@ -63,18 +63,7 @@
"type" : "keyword"
},
"salary_change": {
"type": "float",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing CSV loading made this a problem, as there were parsing exceptions when trying to index float numeric data into integers. I didn't see a convenient way out of this and decided to remove as this field is not being tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can keep this unchanged, that will be great, this is a good example of nested fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just changed for the mapping-default-incompatible mapping, which was created to test some incompatible field mappings that did not include subfields. I'll try to give this another shot but it will require changes to the CSV loader or the dataset 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just changed for the mapping-default-incompatible mapping, which was created to test some incompatible field mappings that did not include subfields. I'll try to give this another shot but it will require changes to the CSV loader or the dataset 😢

Is it easier if we make another copy of employees's schema and data for dense_vector related tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easier if we make another copy of employees's schema and data for dense_vector related tests?

The problem is that changing how the CSV tests load data impacted this dataset. Before this change, multivalues were being uploaded as arrays of strings, which is something we don't want to do for dense_vectors as that is not a format supported on the DenseVectorFieldMapper.

It seemed like too much work to change the actual dataset when that particular field is not actually used in the tests.

I'm open to other solutions here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fang-xing-esql are these fields used in other tests somehow?
I see this was actually added by @carlosdelest a while back #117555.
The employees_incompatible index that is set with this mapping is only used in match function/operator tests.
We don't modify any of those tests here, so this looks like a safe change to me.

@carlosdelest
Copy link
Member Author

Closing this PR, as the final approach will imply multiple field types for the different dense_vector element types.

…ctor_support

# Conflicts:
#	x-pack/plugin/build.gradle
#	x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/BlockUtils.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PlannerUtils.java
…support' into feature/esql_dense_vector_support
@carlosdelest
Copy link
Member Author

Reopening this, as we've decided to go for a single dense_vector field type. We can reevaluate the need for multiple dense_vector element types later.

@carlosdelest carlosdelest reopened this May 23, 2025
@carlosdelest
Copy link
Member Author

@ioanatia @ChrisHegarty @fang-xing-esql I intend to merge this PR on Monday to provide initial support for dense_vector float element types. Please comment if you have any issue with that.

Next steps will be to work on KNN query and similarity functions, and eventually provide support for byte and bit element types.

@carlosdelest carlosdelest added auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 26, 2025
@carlosdelest
Copy link
Member Author

@elasticmachine run docs-build

@carlosdelest carlosdelest merged commit b759161 into elastic:main May 27, 2025
16 of 18 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126456

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request May 27, 2025
(cherry picked from commit b759161)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
#	x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java
#	x-pack/plugin/esql/qa/server/single-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/RestEsqlIT.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/nulls/Coalesce.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java
@carlosdelest
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine added a commit that referenced this pull request May 27, 2025
* ES|QL dense vector field type support (#126456)

(cherry picked from commit b759161)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
#	x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java
#	x-pack/plugin/esql/qa/server/single-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/RestEsqlIT.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/nulls/Coalesce.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/Join.java

* [CI] Auto commit changes from spotless

* Fix switch expression

---------

Co-authored-by: elasticsearchmachine <[email protected]>
fang-xing-esql added a commit to fang-xing-esql/Elasticsearch that referenced this pull request Jun 3, 2025
fang-xing-esql added a commit to fang-xing-esql/Elasticsearch that referenced this pull request Jun 3, 2025
elasticsearchmachine pushed a commit that referenced this pull request Jun 3, 2025
…es (#126150) (#128776)

* Integration tests for LOOKUP JOIN over wider range of data types (#126150)

This test suite tests the lookup join functionality in ESQL with various data types.

For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return.

The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug).

Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

(cherry picked from commit afc53a3)

* fix compilng error

* add missing part of the backport of #126456

* add missing part of the backport of #126456

---------

Co-authored-by: Craig Taverner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants