Skip to content

Update sparse_vector field mapping to include default setting for token pruning #126739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 74 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
e02cd3a
Initial checkin - needs tests
markjhoy Apr 12, 2025
e24ab76
Missing s in IndexVersions
markjhoy Apr 12, 2025
f39b78a
add changelog and docs for index_options
markjhoy Apr 15, 2025
eeebfd8
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 15, 2025
51aab0c
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 21, 2025
983ddf1
correct index version
markjhoy Apr 21, 2025
9545a0c
update tests
markjhoy Apr 21, 2025
19fe72d
Complete tests for SparseVectorFieldMapper
markjhoy Apr 22, 2025
58f9909
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 22, 2025
5f8e7b9
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
d7d27ba
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
d342656
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
96096ba
fix docs
markjhoy Apr 25, 2025
eed88c6
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
6a6052a
fix lint
markjhoy Apr 25, 2025
f977ea8
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
f38a6f1
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
436183b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
24438e3
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
501099d
cleanups + refactoring; fix tests; refine docs;
markjhoy Apr 28, 2025
21323e4
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 28, 2025
a282e27
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
9d5df84
delete changelog - let it autocreate
markjhoy Apr 29, 2025
846fcff
cleanups
markjhoy Apr 29, 2025
2b3299e
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
832fe45
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
e593f17
don't explicitly set the config if not present
markjhoy Apr 29, 2025
3086a4b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
7a24703
use default prune config if prune=true and not set
markjhoy Apr 29, 2025
7ceb12a
fix test
markjhoy Apr 29, 2025
f9d44e5
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
af006d4
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
5dd4728
really fix test :/
markjhoy Apr 29, 2025
f3b4a98
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 29, 2025
7ddb77a
rename index to test3 / previous not cleaned up
markjhoy Apr 30, 2025
02868b1
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 30, 2025
3625a37
fix lint
markjhoy Apr 30, 2025
92db1c6
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 30, 2025
a022b5c
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
6a7f46c
fix the current yaml tests/ensure cleanup
markjhoy May 1, 2025
e95033c
light cleanups
markjhoy May 1, 2025
99c3700
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
bdfc9b8
clean docs
markjhoy May 1, 2025
20bcf20
refactor/move TokenPruningConfig into server
markjhoy May 1, 2025
1f0718d
[CI] Auto commit changes from spotless
elasticsearchmachine May 1, 2025
e30a141
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
404e645
cleanup existing Yaml tests via teardown
markjhoy May 1, 2025
bdfcf5e
fix lint
markjhoy May 1, 2025
f27dfb8
add node feature and simple yml test
markjhoy May 1, 2025
283b563
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
65c5147
remove checked-in test - moving it / add elsewhere
markjhoy May 2, 2025
fc78b0f
cleanups; start of yamlRestTests
markjhoy May 2, 2025
15c5eb3
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 2, 2025
0b17c16
ensure query index version; add test index options
markjhoy May 5, 2025
4b46300
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 5, 2025
08d51c9
[CI] Auto commit changes from spotless
elasticsearchmachine May 5, 2025
ae34841
add yaml tests for ml multi/remote clusters
markjhoy May 5, 2025
74b19ca
fix yaml test
markjhoy May 5, 2025
a341322
finally fix yaml tests?
markjhoy May 5, 2025
a47b915
update docs
markjhoy May 5, 2025
4e681bd
add 8.x tx version; fix yaml tests; optimizations
markjhoy May 5, 2025
6e50539
[CI] Auto commit changes from spotless
elasticsearchmachine May 5, 2025
fcf682f
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
095bb28
explicitly set # of shards for indices for test
markjhoy May 6, 2025
0c8c095
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
5bb6561
fix docs; add backport 8.x index version
markjhoy May 6, 2025
e4d547a
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
6c5e253
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
29dae8a
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
31f9e6d
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
0b1c1d2
cleanups/optimizations
markjhoy May 6, 2025
b48deea
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 7, 2025
7f60eca
Update docs/changelog/126739.yaml
markjhoy May 7, 2025
d8f3c63
refactor equals for IndexOptions
markjhoy May 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/126739.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 126739
summary: Update `sparse_vector` field mapping to include default setting for token
pruning
area: Relevance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping may be a better area for this PR

type: enhancement
issues: []
44 changes: 44 additions & 0 deletions docs/reference/elasticsearch/mapping-reference/sparse-vector.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder, that we'll have to open a PR for 8.19 to update the appropriate asciidoc files as well 👍

Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,28 @@ PUT my-index
}
```

Also, with optional `index_options` for pruning:

```console
PUT my-index
{
"mappings": {
"properties": {
"text.tokens": {
"type": "sparse_vector",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have two examples here - one "simple" example that creates the sparse_vector field with defaults, and one that adds the index options? It's just that we show this, and only explain later on that it's optional, so people may assume it's required? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we decide not to have multiple examples here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another example there... not sure what happened... maybe something got reverted...

"index_options": {
"prune": true,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4
}
}
}
}
}
}
```

See [semantic search with ELSER](docs-content://solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.

## Parameters for `sparse_vector` fields [sparse-vectors-params]
Expand All @@ -36,6 +58,28 @@ The following parameters are accepted by `sparse_vector` fields:
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).

index_options
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your `sparse_query` vector, Elasticsearch will use the default options configured for the field, if any. The available options for the index options are:

Parameters for `index_options` are:

`prune`
: (Optional, boolean) [preview] Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: true.

`pruning_config`
: (Optional, object) [preview] Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used. If `prune` is set to false, an exception will occur.

Parameters for `pruning_config` include:

`tokens_freq_ratio_threshold`
: (Optional, integer) [preview] Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.

`tokens_weight_threshold`
: (Optional, float) [preview] Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.

::::{note}
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
::::


## Multi-value sparse vectors [index-multi-value-sparse-vectors]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@

---
teardown:
# ensure indices are cleaned up after each test
# mainly for the sparse vector tests
- do:
indices.delete:
index: ["test1", "test2"]
ignore: 404
Comment on lines +6 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this has been covered already, but why do we need to explicitly delete the indices? I thought this was handled automatically between YAML test suites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should - but there's a number of tests that try to create indices with the same names, and we've found that it can be flaky sometimes so, best to ensure their removal with each individual test.

- do:
indices.refresh: { }

---
"cluster stats test":
- do:
Expand Down Expand Up @@ -358,6 +370,7 @@
- requires:
cluster_features: [ "gte_v8.15.0" ]
reason: "sparse vector stats added in 8.15"

- do:
indices.create:
index: test1
Expand Down
2 changes: 2 additions & 0 deletions server/src/main/java/org/elasticsearch/TransportVersions.java
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ static TransportVersion def(int id) {
public static final TransportVersion INTRODUCE_FAILURES_DEFAULT_RETENTION_BACKPORT_8_19 = def(8_841_0_26);
public static final TransportVersion RESCORE_VECTOR_ALLOW_ZERO_BACKPORT_8_19 = def(8_841_0_27);
public static final TransportVersion INFERENCE_ADD_TIMEOUT_PUT_ENDPOINT_8_19 = def(8_841_0_28);
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19 = def(8_841_0_29);
public static final TransportVersion V_9_0_0 = def(9_000_0_09);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
Expand Down Expand Up @@ -250,6 +251,7 @@ static TransportVersion def(int id) {
public static final TransportVersion FILE_SETTINGS_HEALTH_INFO = def(9_072_0_00);
public static final TransportVersion FIELD_CAPS_ADD_CLUSTER_ALIAS = def(9_073_0_00);
public static final TransportVersion INFERENCE_ADD_TIMEOUT_PUT_ENDPOINT = def(9_074_00_0);
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS = def(9_075_0_00);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ private static Version parseUnchecked(String version) {
public static final IndexVersion RESCORE_PARAMS_ALLOW_ZERO_TO_QUANTIZED_VECTORS_BACKPORT_8_X = def(8_529_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ_BACKPORT_8_X = def(8_530_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion SEMANTIC_TEXT_DEFAULTS_TO_BBQ_BACKPORT_8_X = def(8_531_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT_BACKPORT_8_X = def(8_532_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion UPGRADE_TO_LUCENE_10_0_0 = def(9_000_0_00, Version.LUCENE_10_0_0);
public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = def(9_001_0_00, Version.LUCENE_10_0_0);
public static final IndexVersion TIME_BASED_K_ORDERED_DOC_ID = def(9_002_0_00, Version.LUCENE_10_0_0);
Expand Down Expand Up @@ -168,6 +169,7 @@ private static Version parseUnchecked(String version) {
public static final IndexVersion DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ = def(9_024_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion SEMANTIC_TEXT_DEFAULTS_TO_BBQ = def(9_025_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion DEFAULT_TO_ACORN_HNSW_FILTER_HEURISTIC = def(9_026_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_027_0_00, Version.LUCENE_10_2_1);
/*
* STOP! READ THIS FIRST! No, really,
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.logging.DeprecationCategory;
import org.elasticsearch.common.lucene.Lucene;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.core.Nullable;
import org.elasticsearch.features.NodeFeature;
import org.elasticsearch.index.IndexVersion;
import org.elasticsearch.index.IndexVersions;
import org.elasticsearch.index.analysis.NamedAnalyzer;
Expand All @@ -31,13 +34,16 @@
import org.elasticsearch.index.mapper.FieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.MapperParsingException;
import org.elasticsearch.index.mapper.MappingParserContext;
import org.elasticsearch.index.mapper.SourceLoader;
import org.elasticsearch.index.mapper.SourceValueFetcher;
import org.elasticsearch.index.mapper.TextSearchInfo;
import org.elasticsearch.index.mapper.ValueFetcher;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.search.fetch.StoredFieldsSpec;
import org.elasticsearch.search.lookup.Source;
import org.elasticsearch.xcontent.ToXContent;
import org.elasticsearch.xcontent.XContentBuilder;
import org.elasticsearch.xcontent.XContentParser.Token;

Expand All @@ -46,6 +52,7 @@
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.stream.Stream;

import static org.elasticsearch.index.query.AbstractQueryBuilder.DEFAULT_BOOST;
Expand All @@ -57,6 +64,7 @@
public class SparseVectorFieldMapper extends FieldMapper {

public static final String CONTENT_TYPE = "sparse_vector";
public static final String SPARSE_VECTOR_INDEX_OPTIONS = "index_options";

static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to "
+ "contain [sparse_vector] fields, but they cannot be indexed or searched.";
Expand All @@ -65,6 +73,12 @@ public class SparseVectorFieldMapper extends FieldMapper {

static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersions.NEW_SPARSE_VECTOR;
static final IndexVersion SPARSE_VECTOR_IN_FIELD_NAMES_INDEX_VERSION = IndexVersions.SPARSE_VECTOR_IN_FIELD_NAMES_SUPPORT;
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION =
IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT;
Comment on lines +76 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we make this package private like the other index versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the SparseVectorQueryBuilder to ensure we use the same index version... we could add the same variable there, but, I think this would be more consistent.


private final SparseVectorFieldMapper.IndexOptions indexOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to store indexOptions in the mapper, we're already storing it in the field type


public static final NodeFeature SPARSE_VECTOR_INDEX_OPTIONS_FEATURE = new NodeFeature("sparse_vector_index_options_supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiniest of nits here, but we normally pseudo-namespace node features using .. In this case, I think sparse_vector.index_options_supported would work well, where sparse_vector. is the pseudo-namespace.


private static SparseVectorFieldMapper toType(FieldMapper in) {
return (SparseVectorFieldMapper) in;
Expand All @@ -73,9 +87,23 @@ private static SparseVectorFieldMapper toType(FieldMapper in) {
public static class Builder extends FieldMapper.Builder {
private final Parameter<Boolean> stored = Parameter.storeParam(m -> toType(m).fieldType().isStored(), false);
private final Parameter<Map<String, String>> meta = Parameter.metaParam();
private final Parameter<IndexOptions> indexOptions;

public Builder(String name) {
super(name);
this.indexOptions = new Parameter<>(
SPARSE_VECTOR_INDEX_OPTIONS,
true,
() -> null,
(n, c, o) -> parseIndexOptions(c, o),
m -> toType(m).fieldType().indexOptions,
(b, n, v) -> {
if (v != null) {
b.field(n, v);
}
},
Comment on lines +100 to +104
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we can simplify this to XContentBuilder::field

Objects::toString
);
}

public Builder setStored(boolean value) {
Expand All @@ -85,19 +113,40 @@ public Builder setStored(boolean value) {

@Override
protected Parameter<?>[] getParameters() {
return new Parameter<?>[] { stored, meta };
return new Parameter<?>[] { stored, meta, indexOptions };
}

@Override
public SparseVectorFieldMapper build(MapperBuilderContext context) {
return new SparseVectorFieldMapper(
leafName(),
new SparseVectorFieldType(context.buildFullName(leafName()), stored.getValue(), meta.getValue()),
new SparseVectorFieldType(context.buildFullName(leafName()), stored.getValue(), meta.getValue(), indexOptions.getValue()),
builderParams(this, context)
);
}
}

public IndexOptions getIndexOptions() {
return this.indexOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the index options by calling fieldType().getIndexOptions() instead

}

private static SparseVectorFieldMapper.IndexOptions parseIndexOptions(MappingParserContext context, Object propNode) {
if (propNode == null) {
return null;
}

Map<String, Object> indexOptionsMap = XContentMapValues.nodeMapValue(propNode, SPARSE_VECTOR_INDEX_OPTIONS);

Boolean prune = IndexOptions.parseIndexOptionsPruneValue(indexOptionsMap);
TokenPruningConfig pruningConfig = IndexOptions.parseIndexOptionsPruningConfig(prune, indexOptionsMap);

if (prune == null && pruningConfig == null) {
return null;
}

return new SparseVectorFieldMapper.IndexOptions(prune, pruningConfig);
}

public static final TypeParser PARSER = new TypeParser((n, c) -> {
if (c.indexVersionCreated().before(PREVIOUS_SPARSE_VECTOR_INDEX_VERSION)) {
deprecationLogger.warn(DeprecationCategory.MAPPINGS, "sparse_vector", ERROR_MESSAGE_7X);
Expand All @@ -109,9 +158,24 @@ public SparseVectorFieldMapper build(MapperBuilderContext context) {
}, notInMultiFields(CONTENT_TYPE));

public static final class SparseVectorFieldType extends MappedFieldType {
private final IndexOptions indexOptions;

public SparseVectorFieldType(String name, boolean isStored, Map<String, String> meta) {
this(name, isStored, meta, null);
}

public SparseVectorFieldType(
String name,
boolean isStored,
Map<String, String> meta,
@Nullable SparseVectorFieldMapper.IndexOptions indexOptions
) {
super(name, true, isStored, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);
this.indexOptions = indexOptions;
}

public IndexOptions getIndexOptions() {
return indexOptions;
}

@Override
Expand Down Expand Up @@ -159,6 +223,7 @@ private static String indexedValueForSearch(Object value) {

private SparseVectorFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
super(simpleName, mappedFieldType, builderParams);
this.indexOptions = ((SparseVectorFieldType) mappedFieldType).getIndexOptions();
}

@Override
Expand Down Expand Up @@ -364,4 +429,86 @@ public void reset() {
}
}

public static class IndexOptions implements ToXContent {
public static final String PRUNE_FIELD_NAME = "prune";
public static final String PRUNING_CONFIG_FIELD_NAME = "pruning_config";

final Boolean prune;
final TokenPruningConfig pruningConfig;

IndexOptions(@Nullable Boolean prune, @Nullable TokenPruningConfig pruningConfig) {
this.prune = prune;
this.pruningConfig = pruningConfig;
}

public Boolean getPrune() {
return prune;
}

public TokenPruningConfig getPruningConfig() {
return pruningConfig;
}

@Override
public final boolean equals(Object other) {
if (other == this) {
return true;
}

if (other == null || getClass() != other.getClass()) {
return false;
}

IndexOptions otherAsIndexOptions = (IndexOptions) other;
return Objects.equals(prune, otherAsIndexOptions.prune) && Objects.equals(pruningConfig, otherAsIndexOptions.pruningConfig);
}

@Override
public final int hashCode() {
return Objects.hash(prune, pruningConfig);
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();

if (prune != null) {
builder.field(PRUNE_FIELD_NAME, prune);
}
if (pruningConfig != null) {
builder.field(PRUNING_CONFIG_FIELD_NAME, pruningConfig);
}

builder.endObject();
return builder;
}

public static Boolean parseIndexOptionsPruneValue(Map<String, Object> indexOptionsMap) {
Object shouldPrune = indexOptionsMap.remove(IndexOptions.PRUNE_FIELD_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not a fan of mutating the passed-in map. We accept it elsewhere because it's baked in at this point, but IMO we shouldn't do it for new implementations.

if (shouldPrune == null) {
return null;
}

if (shouldPrune instanceof Boolean boolValue) {
return boolValue;
}

throw new MapperParsingException("[index_options] field [prune] should be true or false");
}

public static TokenPruningConfig parseIndexOptionsPruningConfig(Boolean prune, Map<String, Object> indexOptionsMap) {
Object pruningConfiguration = indexOptionsMap.remove(IndexOptions.PRUNING_CONFIG_FIELD_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

if (pruningConfiguration == null) {
return null;
}

if (prune == null || prune == false) {
throw new MapperParsingException("[index_options] field [pruning_config] should only be set if [prune] is set to true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better to use variables to build the string ( i.e PRUNE_FIELD_NAME, PRUNING_CONFIG_FIELD_NAME)

}

Map<String, Object> pruningConfigurationMap = XContentMapValues.nodeMapValue(pruningConfiguration, PRUNING_CONFIG_FIELD_NAME);

return TokenPruningConfig.parseFromMap(pruningConfigurationMap);
}
}
}
Loading
Loading