Skip to content

Update sparse_vector field mapping to include default setting for token pruning #126739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 129 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
e02cd3a
Initial checkin - needs tests
markjhoy Apr 12, 2025
e24ab76
Missing s in IndexVersions
markjhoy Apr 12, 2025
f39b78a
add changelog and docs for index_options
markjhoy Apr 15, 2025
eeebfd8
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 15, 2025
51aab0c
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 21, 2025
983ddf1
correct index version
markjhoy Apr 21, 2025
9545a0c
update tests
markjhoy Apr 21, 2025
19fe72d
Complete tests for SparseVectorFieldMapper
markjhoy Apr 22, 2025
58f9909
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 22, 2025
5f8e7b9
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
d7d27ba
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
d342656
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
96096ba
fix docs
markjhoy Apr 25, 2025
eed88c6
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 25, 2025
6a6052a
fix lint
markjhoy Apr 25, 2025
f977ea8
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
f38a6f1
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
436183b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
24438e3
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 28, 2025
501099d
cleanups + refactoring; fix tests; refine docs;
markjhoy Apr 28, 2025
21323e4
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 28, 2025
a282e27
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
9d5df84
delete changelog - let it autocreate
markjhoy Apr 29, 2025
846fcff
cleanups
markjhoy Apr 29, 2025
2b3299e
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
832fe45
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
e593f17
don't explicitly set the config if not present
markjhoy Apr 29, 2025
3086a4b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
7a24703
use default prune config if prune=true and not set
markjhoy Apr 29, 2025
7ceb12a
fix test
markjhoy Apr 29, 2025
f9d44e5
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
af006d4
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 29, 2025
5dd4728
really fix test :/
markjhoy Apr 29, 2025
f3b4a98
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 29, 2025
7ddb77a
rename index to test3 / previous not cleaned up
markjhoy Apr 30, 2025
02868b1
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Apr 30, 2025
3625a37
fix lint
markjhoy Apr 30, 2025
92db1c6
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 30, 2025
a022b5c
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
6a7f46c
fix the current yaml tests/ensure cleanup
markjhoy May 1, 2025
e95033c
light cleanups
markjhoy May 1, 2025
99c3700
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
bdfc9b8
clean docs
markjhoy May 1, 2025
20bcf20
refactor/move TokenPruningConfig into server
markjhoy May 1, 2025
1f0718d
[CI] Auto commit changes from spotless
elasticsearchmachine May 1, 2025
e30a141
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
404e645
cleanup existing Yaml tests via teardown
markjhoy May 1, 2025
bdfcf5e
fix lint
markjhoy May 1, 2025
f27dfb8
add node feature and simple yml test
markjhoy May 1, 2025
283b563
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 1, 2025
65c5147
remove checked-in test - moving it / add elsewhere
markjhoy May 2, 2025
fc78b0f
cleanups; start of yamlRestTests
markjhoy May 2, 2025
15c5eb3
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 2, 2025
0b17c16
ensure query index version; add test index options
markjhoy May 5, 2025
4b46300
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 5, 2025
08d51c9
[CI] Auto commit changes from spotless
elasticsearchmachine May 5, 2025
ae34841
add yaml tests for ml multi/remote clusters
markjhoy May 5, 2025
74b19ca
fix yaml test
markjhoy May 5, 2025
a341322
finally fix yaml tests?
markjhoy May 5, 2025
a47b915
update docs
markjhoy May 5, 2025
4e681bd
add 8.x tx version; fix yaml tests; optimizations
markjhoy May 5, 2025
6e50539
[CI] Auto commit changes from spotless
elasticsearchmachine May 5, 2025
fcf682f
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
095bb28
explicitly set # of shards for indices for test
markjhoy May 6, 2025
0c8c095
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
5bb6561
fix docs; add backport 8.x index version
markjhoy May 6, 2025
e4d547a
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
6c5e253
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
29dae8a
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
31f9e6d
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 6, 2025
0b1c1d2
cleanups/optimizations
markjhoy May 6, 2025
b48deea
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 7, 2025
7f60eca
Update docs/changelog/126739.yaml
markjhoy May 7, 2025
d8f3c63
refactor equals for IndexOptions
markjhoy May 7, 2025
be78331
some cleanups; refactoring
markjhoy May 21, 2025
d76323d
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 21, 2025
a0cc202
[CI] Auto commit changes from spotless
elasticsearchmachine May 21, 2025
223a794
fix yaml tests
markjhoy May 21, 2025
dd4a218
fix missing }
markjhoy May 21, 2025
d60e2df
[CI] Auto commit changes from spotless
elasticsearchmachine May 21, 2025
a3bb904
more cleanups; need to complete tests
markjhoy May 22, 2025
4a23c9c
[CI] Auto commit changes from spotless
elasticsearchmachine May 22, 2025
e2e65db
additional tests and refactoring
markjhoy May 22, 2025
cf7a302
[CI] Auto commit changes from spotless
elasticsearchmachine May 22, 2025
7a12676
fix lint
markjhoy May 22, 2025
b43b19e
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 22, 2025
28b5b74
[CI] Auto commit changes from spotless
elasticsearchmachine May 22, 2025
515cf9e
additional cleanups and refactoring
markjhoy May 24, 2025
1a692fc
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 24, 2025
4320315
fix tests
markjhoy May 24, 2025
1cd6f5e
[CI] Auto commit changes from spotless
elasticsearchmachine May 24, 2025
c5b9def
refactor and cleanups
markjhoy May 26, 2025
fd07f1c
additional cleanups for clarity
markjhoy May 26, 2025
42bc77d
[CI] Auto commit changes from spotless
elasticsearchmachine May 26, 2025
8f6672f
YAML test cleanups
markjhoy May 27, 2025
a6bdc90
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 27, 2025
f499269
fix semantic highligter rtest
markjhoy May 27, 2025
5e9fee1
fix semantictexthighliter tests
markjhoy May 27, 2025
6209919
(correctly) fix SemanticTextHighlighterTests tests
markjhoy May 27, 2025
5fefadb
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 27, 2025
a4bb87c
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 28, 2025
1192271
no pruning if we do not have an indexreader
markjhoy May 28, 2025
d7f1351
check for searcher existence instead of IndxReader
markjhoy May 28, 2025
94e21df
update docs with applies_to
markjhoy May 28, 2025
212c850
failing simple test
markjhoy May 30, 2025
c8fcd94
[CI] Auto commit changes from spotless
elasticsearchmachine May 30, 2025
fa1737d
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy May 30, 2025
7d2f361
move test to core package
markjhoy May 30, 2025
0f9bd65
move test to _actual_ proper place
markjhoy May 30, 2025
2afeb2b
[CI] Auto commit changes from spotless
elasticsearchmachine May 30, 2025
60a6b3e
add integration tests for index_options / defaults
markjhoy Jun 2, 2025
9fd5e72
cleanup IT
markjhoy Jun 2, 2025
fe2e267
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 2, 2025
9953513
cleanups; add query pruning override random test
markjhoy Jun 2, 2025
925173c
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 2, 2025
d7b064b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 2, 2025
04a597e
check for supported index version index_options
markjhoy Jun 2, 2025
bbcd309
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 2, 2025
8ba9aef
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 3, 2025
c4c65b9
cleanups and refactoring
markjhoy Jun 4, 2025
9b7327b
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 4, 2025
88fc1f4
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 4, 2025
c0704df
clean SparseVectorQueryBuilderTests
markjhoy Jun 4, 2025
368f8be
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 4, 2025
a43d2e3
fix failing test
markjhoy Jun 4, 2025
c755cc0
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 4, 2025
7a41e0f
clean fix integration / rest tests
markjhoy Jun 4, 2025
c02a647
fix yaml default pruning tests
markjhoy Jun 5, 2025
811ca1a
Merge branch 'main' into markjhoy/default_token_pruning_sparse_vector
markjhoy Jun 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/126739.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 126739
summary: Update `sparse_vector` field mapping to include default setting for token
pruning
area: Relevance
type: enhancement
issues: []
44 changes: 44 additions & 0 deletions docs/reference/elasticsearch/mapping-reference/sparse-vector.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder, that we'll have to open a PR for 8.19 to update the appropriate asciidoc files as well 👍

Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,28 @@ PUT my-index
}
```

Also, with optional `index_options` for pruning:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some clarification here, RE: why you might want to override token pruning?


```console
PUT my-index
{
"mappings": {
"properties": {
"text.tokens": {
"type": "sparse_vector",
"index_options": {
"prune": true,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4
}
}
}
}
}
}
```

See [semantic search with ELSER](docs-content://solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.

## Parameters for `sparse_vector` fields [sparse-vectors-params]
Expand All @@ -36,6 +58,28 @@ The following parameters are accepted by `sparse_vector` fields:
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).

index_options
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your `sparse_query` vector, Elasticsearch will use the default options configured for the field, if any. The available options for the index options are:

Parameters for `index_options` are:

`prune`
: (Optional, boolean) [preview] Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: true.

`pruning_config`
: (Optional, object) [preview] Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used. If `prune` is set to false, an exception will occur.

Parameters for `pruning_config` include:

`tokens_freq_ratio_threshold`
: (Optional, integer) [preview] Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.

`tokens_weight_threshold`
: (Optional, float) [preview] Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.

::::{note}
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
::::


## Multi-value sparse vectors [index-multi-value-sparse-vectors]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@

---
teardown:
# ensure indices are cleaned up after each test
# mainly for the sparse vector tests
- do:
indices.delete:
index: ["test1", "test2"]
ignore: 404
- do:
indices.refresh: { }

---
"cluster stats test":
- do:
Expand Down Expand Up @@ -358,6 +370,7 @@
- requires:
cluster_features: [ "gte_v8.15.0" ]
reason: "sparse vector stats added in 8.15"

- do:
indices.create:
index: test1
Expand Down
2 changes: 2 additions & 0 deletions server/src/main/java/org/elasticsearch/TransportVersions.java
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ static TransportVersion def(int id) {
public static final TransportVersion INTRODUCE_FAILURES_DEFAULT_RETENTION_BACKPORT_8_19 = def(8_841_0_26);
public static final TransportVersion RESCORE_VECTOR_ALLOW_ZERO_BACKPORT_8_19 = def(8_841_0_27);
public static final TransportVersion INFERENCE_ADD_TIMEOUT_PUT_ENDPOINT_8_19 = def(8_841_0_28);
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19 = def(8_841_0_29);
public static final TransportVersion V_9_0_0 = def(9_000_0_09);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
Expand Down Expand Up @@ -250,6 +251,7 @@ static TransportVersion def(int id) {
public static final TransportVersion FILE_SETTINGS_HEALTH_INFO = def(9_072_0_00);
public static final TransportVersion FIELD_CAPS_ADD_CLUSTER_ALIAS = def(9_073_0_00);
public static final TransportVersion INFERENCE_ADD_TIMEOUT_PUT_ENDPOINT = def(9_074_00_0);
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS = def(9_075_0_00);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ private static Version parseUnchecked(String version) {
public static final IndexVersion RESCORE_PARAMS_ALLOW_ZERO_TO_QUANTIZED_VECTORS_BACKPORT_8_X = def(8_529_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ_BACKPORT_8_X = def(8_530_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion SEMANTIC_TEXT_DEFAULTS_TO_BBQ_BACKPORT_8_X = def(8_531_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT_BACKPORT_8_X = def(8_532_0_00, Version.LUCENE_9_12_1);
public static final IndexVersion UPGRADE_TO_LUCENE_10_0_0 = def(9_000_0_00, Version.LUCENE_10_0_0);
public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = def(9_001_0_00, Version.LUCENE_10_0_0);
public static final IndexVersion TIME_BASED_K_ORDERED_DOC_ID = def(9_002_0_00, Version.LUCENE_10_0_0);
Expand Down Expand Up @@ -168,6 +169,7 @@ private static Version parseUnchecked(String version) {
public static final IndexVersion DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ = def(9_024_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion SEMANTIC_TEXT_DEFAULTS_TO_BBQ = def(9_025_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion DEFAULT_TO_ACORN_HNSW_FILTER_HEURISTIC = def(9_026_0_00, Version.LUCENE_10_2_1);
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_027_0_00, Version.LUCENE_10_2_1);
/*
* STOP! READ THIS FIRST! No, really,
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.logging.DeprecationCategory;
import org.elasticsearch.common.lucene.Lucene;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.core.Nullable;
import org.elasticsearch.features.NodeFeature;
import org.elasticsearch.index.IndexVersion;
import org.elasticsearch.index.IndexVersions;
import org.elasticsearch.index.analysis.NamedAnalyzer;
Expand All @@ -31,13 +34,16 @@
import org.elasticsearch.index.mapper.FieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.MapperParsingException;
import org.elasticsearch.index.mapper.MappingParserContext;
import org.elasticsearch.index.mapper.SourceLoader;
import org.elasticsearch.index.mapper.SourceValueFetcher;
import org.elasticsearch.index.mapper.TextSearchInfo;
import org.elasticsearch.index.mapper.ValueFetcher;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.search.fetch.StoredFieldsSpec;
import org.elasticsearch.search.lookup.Source;
import org.elasticsearch.xcontent.ToXContent;
import org.elasticsearch.xcontent.XContentBuilder;
import org.elasticsearch.xcontent.XContentParser.Token;

Expand All @@ -46,6 +52,7 @@
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.stream.Stream;

import static org.elasticsearch.index.query.AbstractQueryBuilder.DEFAULT_BOOST;
Expand All @@ -57,6 +64,7 @@
public class SparseVectorFieldMapper extends FieldMapper {

public static final String CONTENT_TYPE = "sparse_vector";
public static final String SPARSE_VECTOR_INDEX_OPTIONS = "index_options";

static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to "
+ "contain [sparse_vector] fields, but they cannot be indexed or searched.";
Expand All @@ -65,6 +73,12 @@ public class SparseVectorFieldMapper extends FieldMapper {

static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersions.NEW_SPARSE_VECTOR;
static final IndexVersion SPARSE_VECTOR_IN_FIELD_NAMES_INDEX_VERSION = IndexVersions.SPARSE_VECTOR_IN_FIELD_NAMES_SUPPORT;
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION =
IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we make this package private like the other index versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the SparseVectorQueryBuilder to ensure we use the same index version... we could add the same variable there, but, I think this would be more consistent.


private final SparseVectorFieldMapper.IndexOptions indexOptions;

public static final NodeFeature SPARSE_VECTOR_INDEX_OPTIONS_FEATURE = new NodeFeature("sparse_vector_index_options_supported");

private static SparseVectorFieldMapper toType(FieldMapper in) {
return (SparseVectorFieldMapper) in;
Expand All @@ -73,9 +87,23 @@ private static SparseVectorFieldMapper toType(FieldMapper in) {
public static class Builder extends FieldMapper.Builder {
private final Parameter<Boolean> stored = Parameter.storeParam(m -> toType(m).fieldType().isStored(), false);
private final Parameter<Map<String, String>> meta = Parameter.metaParam();
private final Parameter<IndexOptions> indexOptions;

public Builder(String name) {
super(name);
this.indexOptions = new Parameter<>(
SPARSE_VECTOR_INDEX_OPTIONS,
true,
() -> null,
(n, c, o) -> parseIndexOptions(c, o),
m -> toType(m).fieldType().indexOptions,
(b, n, v) -> {
if (v != null) {
b.field(n, v);
}
},
Objects::toString
);
}

public Builder setStored(boolean value) {
Expand All @@ -85,19 +113,40 @@ public Builder setStored(boolean value) {

@Override
protected Parameter<?>[] getParameters() {
return new Parameter<?>[] { stored, meta };
return new Parameter<?>[] { stored, meta, indexOptions };
}

@Override
public SparseVectorFieldMapper build(MapperBuilderContext context) {
return new SparseVectorFieldMapper(
leafName(),
new SparseVectorFieldType(context.buildFullName(leafName()), stored.getValue(), meta.getValue()),
new SparseVectorFieldType(context.buildFullName(leafName()), stored.getValue(), meta.getValue(), indexOptions.getValue()),
builderParams(this, context)
);
}
}

public IndexOptions getIndexOptions() {
return this.indexOptions;
}

private static SparseVectorFieldMapper.IndexOptions parseIndexOptions(MappingParserContext context, Object propNode) {
if (propNode == null) {
return null;
}

Map<String, Object> indexOptionsMap = XContentMapValues.nodeMapValue(propNode, SPARSE_VECTOR_INDEX_OPTIONS);

Boolean prune = IndexOptions.parseIndexOptionsPruneValue(indexOptionsMap);
TokenPruningConfig pruningConfig = IndexOptions.parseIndexOptionsPruningConfig(prune, indexOptionsMap);

if (prune == null && pruningConfig == null) {
return null;
}

return new SparseVectorFieldMapper.IndexOptions(prune, pruningConfig);
}

public static final TypeParser PARSER = new TypeParser((n, c) -> {
if (c.indexVersionCreated().before(PREVIOUS_SPARSE_VECTOR_INDEX_VERSION)) {
deprecationLogger.warn(DeprecationCategory.MAPPINGS, "sparse_vector", ERROR_MESSAGE_7X);
Expand All @@ -109,9 +158,24 @@ public SparseVectorFieldMapper build(MapperBuilderContext context) {
}, notInMultiFields(CONTENT_TYPE));

public static final class SparseVectorFieldType extends MappedFieldType {
private final IndexOptions indexOptions;

public SparseVectorFieldType(String name, boolean isStored, Map<String, String> meta) {
this(name, isStored, meta, null);
}

public SparseVectorFieldType(
String name,
boolean isStored,
Map<String, String> meta,
@Nullable SparseVectorFieldMapper.IndexOptions indexOptions
) {
super(name, true, isStored, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);
this.indexOptions = indexOptions;
}

public IndexOptions getIndexOptions() {
return indexOptions;
}

@Override
Expand Down Expand Up @@ -159,6 +223,7 @@ private static String indexedValueForSearch(Object value) {

private SparseVectorFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
super(simpleName, mappedFieldType, builderParams);
this.indexOptions = ((SparseVectorFieldType) mappedFieldType).getIndexOptions();
}

@Override
Expand Down Expand Up @@ -364,4 +429,86 @@ public void reset() {
}
}

public static class IndexOptions implements ToXContent {
public static final String PRUNE_FIELD_NAME = "prune";
public static final String PRUNING_CONFIG_FIELD_NAME = "pruning_config";

final Boolean prune;
final TokenPruningConfig pruningConfig;

IndexOptions(@Nullable Boolean prune, @Nullable TokenPruningConfig pruningConfig) {
this.prune = prune;
this.pruningConfig = pruningConfig;
}

public Boolean getPrune() {
return prune;
}

public TokenPruningConfig getPruningConfig() {
return pruningConfig;
}

@Override
public final boolean equals(Object other) {
if (other == this) {
return true;
}

if (other == null || getClass() != other.getClass()) {
return false;
}

IndexOptions otherAsIndexOptions = (IndexOptions) other;
return Objects.equals(prune, otherAsIndexOptions.prune) && Objects.equals(pruningConfig, otherAsIndexOptions.pruningConfig);
}

@Override
public final int hashCode() {
return Objects.hash(prune, pruningConfig);
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here - if both prune and pruning_config are null is it OK to return the empty object?


if (prune != null) {
builder.field(PRUNE_FIELD_NAME, prune);
}
if (pruningConfig != null) {
builder.field(PRUNING_CONFIG_FIELD_NAME, pruningConfig);
}

builder.endObject();
return builder;
}

public static Boolean parseIndexOptionsPruneValue(Map<String, Object> indexOptionsMap) {
Object shouldPrune = indexOptionsMap.remove(IndexOptions.PRUNE_FIELD_NAME);
if (shouldPrune == null) {
return null;
}

if (shouldPrune instanceof Boolean boolValue) {
return boolValue;
}

throw new MapperParsingException("[index_options] field [prune] should be true or false");
}

public static TokenPruningConfig parseIndexOptionsPruningConfig(Boolean prune, Map<String, Object> indexOptionsMap) {
Object pruningConfiguration = indexOptionsMap.remove(IndexOptions.PRUNING_CONFIG_FIELD_NAME);
if (pruningConfiguration == null) {
return null;
}

if (prune == null || prune == false) {
throw new MapperParsingException("[index_options] field [pruning_config] should only be set if [prune] is set to true");
}

Map<String, Object> pruningConfigurationMap = XContentMapValues.nodeMapValue(pruningConfiguration, PRUNING_CONFIG_FIELD_NAME);

return TokenPruningConfig.parseFromMap(pruningConfigurationMap);
}
}
}
Loading
Loading