Adding base64 indexing for vector values #137072

benwtrent · 2025-10-23T21:23:10Z

This adds support for indexing vectors via base64. Parsing floating point arrays in JSON is...not cheap. So, if we encode the bytes in a string, then we improve throughput.

Example python transforming a file (thank you copilot...)

def base64_vector(dims):
    vec = np.random.rand(dims).astype(np.float32)
    # switch from default local of little endian to big endian
    byte_array = vec.byteswap().tobytes()
    # encode as base64
    return base64.b64encode(byte_array).decode("utf-8")

I benchmarked locally with random_vector track indexing to flat index, and here are the highlights:

|                                                        Metric |                Task |    Baseline |    Contender |       Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|--------------------:|------------:|-------------:|-----------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                     |    2.74863  |     0.393017 |   -2.35562 |    min |  -85.70% |
|                                                Min Throughput |     random-indexing |  967.886    |  1515.79     |  547.903   | docs/s |  +56.61% |
|                                               Mean Throughput |     random-indexing | 1518.27     | 10175        | 8656.71    | docs/s | +570.17% |
|                                             Median Throughput |     random-indexing | 1531.12     | 10592.2      | 9061.07    | docs/s | +591.80% |
|                                                Max Throughput |     random-indexing | 1538.79     | 11064.7      | 9525.94    | docs/s | +619.05% |

elasticsearchmachine · 2025-10-23T21:23:34Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-10-23T21:23:35Z

Hi @benwtrent, I've created a changelog YAML for you.

iverase · 2025-10-24T16:29:25Z

I understand that lucene API is little endian and that's the reason little endian has been chosen to represent the float array. On the other hand Elasticsearch API's are big endian (think on BigArrays) so I would be more incline to use big endianness here as this is an Elasticsearch API and in addition we could read those bytes directly to BigArrays.

…float32-support

benwtrent · 2025-10-28T13:13:05Z

Ah, I need to reconfigure value fetching here, users could provide a mix of arrays and base64 strings into the same set of docs and value fetching should work for all of it.

benwtrent · 2025-10-30T21:23:24Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

-                protected Object parseSourceValue(Object value) {
-                    if (value.equals("")) {
-                        return null;
+                public List<Object> fetchValues(Source source, int doc, List<Object> ignoredValues) {


@carlosdelest what do you think of this?

I think it's ok as we need to retrieve both Strings and numeric arrays from source - something I did not do on the previous iteration. Makes sense to me.

@carlosdelest are there any tests where ESQL indexes docs and then fetches the values? If so, I would like to add some base64 values for vectors and ensure when fetched they are always transformed to arrays (as that is all ESQL supports for now).

@benwtrent sure! Check DenseVectorFieldTypeIT, docs are added here and then retrieved in other methods. It would be a good idea to add randomly using base 64 / hex strings / arrays 👍

…float32-support

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

thecoop · 2025-10-31T11:17:08Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

-                case VALUE_STRING -> parseHexEncodedVector(context, dimChecker, similarity);
+                case VALUE_STRING -> {
+                    String s = context.parser().text();
+                    if (s.length() == dims * 2 && isMaybeHexString(s)) {


Is it worth doing the check on each character here? Maybe just try to parse it as hex straight away if its the right length?

@thecoop 🤔 hmm, likely that is good enough, especially since I already wrap with a try{}catch(...), this was part of an earlier iteration where I wasn't retrying on failure.

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

carlosdelest

LGTM 👍

Adding tests to ES|QL would be great to ensure the source value fetcher works as intended.

…float32-support

carlosdelest

LGTM 👍

A question on VALUE_EMBEDDED_OBJECT and how it is tested

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

carlosdelest · 2025-11-03T07:26:39Z

...n/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java

        @Name("similarity") DenseVectorFieldMapper.VectorSimilarity similarity,
        @Name("index") boolean index,
-        @Name("synthetic") boolean synthetic
+        @Name("synthetic") VectorSourceOptions sourceOptions


Nit

Suggested change

@Name("synthetic") VectorSourceOptions sourceOptions

@Name("sourceOptions") VectorSourceOptions sourceOptions

carlosdelest · 2025-11-03T07:28:09Z

...n/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java

                }
-                docs[i] = prepareIndex("test").setId("" + i).setSource("id", String.valueOf(i), "vector", vector);
+                Object vectorToIndex;
+                if (randomBoolean()) {


Nice! Thanks for adding this

thecoop · 2025-11-03T09:30:44Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

+                                case String s -> values.add(s);
+                                default -> ignoredValues.add(sourceValue);
+                            }
+                        } catch (Exception e) {


I'm not sure what this catch is catching - the try is just adding to a collection, which can't really fail?

thecoop · 2025-11-03T09:34:49Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

-                    yield decodedVector.length;
+                    String v = context.parser().text();
+                    // Base64 is always divisible by 4, so if it's not, assume hex
+                    if (v.length() % 4 != 0 || isMaybeHexString(v)) {


Want to keep the Maybe check here, or just try to parse it directly below?

...n/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java

…float32-support

…rent/elasticsearch into add-base64-encoded-float32-support

iverase

LGTM. Thank you Ben!

thecoop · 2025-11-03T15:28:53Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

+                                    values.add(NumberFieldMapper.NumberType.FLOAT.parse(o, false));
+                                }
+                            } else if (sourceValue instanceof String s) {
+                                if ((element.elementType() == BYTE_ELEMENT.elementType()


element.elementType() == ElementType.BYTE || element.elementType() == ElementType.BIT

thecoop

a nit, otherwise LGTM

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

Adding base64 indexing for vector values

8cc11a8

benwtrent added >enhancement :Search Relevance/Vectors Vector search v9.3.0 labels Oct 23, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 23, 2025

Update docs/changelog/137072.yaml

c2c5d27

benwtrent added 6 commits October 27, 2025 10:22

Merge remote-tracking branch 'upstream/main' into add-base64-encoded-…

ba4793d

…float32-support

Switching to BIG ENDIAN, adding more tests

5e9baed

iter

bf3de81

iter

2c86500

iter

ddd52eb

Merge remote-tracking branch 'upstream/main' into add-base64-encoded-…

24a9d8e

…float32-support

benwtrent and others added 3 commits October 28, 2025 13:49

iter

c62be9a

[CI] Auto commit changes from spotless

abfefb5

fixing formatting

25bbdce

benwtrent commented Oct 30, 2025

View reviewed changes

benwtrent added 2 commits October 31, 2025 07:08

iter

44862b4

Merge remote-tracking branch 'upstream/main' into add-base64-encoded-…

49c117b

…float32-support

benwtrent requested a review from carlosdelest October 31, 2025 11:10

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Show resolved Hide resolved

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

thecoop reviewed Oct 31, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

carlosdelest approved these changes Oct 31, 2025

View reviewed changes

benwtrent added 2 commits October 31, 2025 13:37

Adding further tests and support

24d33ce

Merge remote-tracking branch 'upstream/main' into add-base64-encoded-…

0113501

…float32-support

benwtrent requested review from carlosdelest and thecoop October 31, 2025 17:38

Merge branch 'main' into add-base64-encoded-float32-support

44dde19

carlosdelest approved these changes Nov 3, 2025

View reviewed changes

thecoop reviewed Nov 3, 2025

View reviewed changes

iverase reviewed Nov 3, 2025

View reviewed changes

...n/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java Outdated Show resolved Hide resolved

iverase reviewed Nov 3, 2025

View reviewed changes

...n/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java Outdated Show resolved Hide resolved

benwtrent added 3 commits November 3, 2025 10:05

Merge remote-tracking branch 'upstream/main' into add-base64-encoded-…

38cac73

…float32-support

addressing PR comments

3652c36

Merge branch 'add-base64-encoded-float32-support' of github.com:benwt…

fcd9a28

…rent/elasticsearch into add-base64-encoded-float32-support

benwtrent requested review from iverase and thecoop November 3, 2025 15:20

iverase approved these changes Nov 3, 2025

View reviewed changes

thecoop reviewed Nov 3, 2025

View reviewed changes

thecoop approved these changes Nov 3, 2025

View reviewed changes

thecoop reviewed Nov 3, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java Outdated Show resolved Hide resolved

iter

c77d674

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 3, 2025

Merge branch 'main' into add-base64-encoded-float32-support

fe782c4

	@Name("synthetic") VectorSourceOptions sourceOptions
	@Name("sourceOptions") VectorSourceOptions sourceOptions

Adding base64 indexing for vector values #137072

Are you sure you want to change the base?

Adding base64 indexing for vector values #137072

Conversation

benwtrent commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 23, 2025

Uh oh!

elasticsearchmachine commented Oct 23, 2025

Uh oh!

iverase commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

benwtrent commented Oct 23, 2025 •

edited

Loading

iverase commented Oct 24, 2025 •

edited

Loading

carlosdelest Oct 31, 2025 •

edited

Loading