Skip to content

Adding support to exclude semantic_text subfields #127664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1a3bb97
Adding support to exclude semantic_text subfields
Samiul-TheSoccerFan May 2, 2025
522730e
Update docs/changelog/127664.yaml
Samiul-TheSoccerFan May 2, 2025
db64ad2
Updating changelog file
Samiul-TheSoccerFan May 2, 2025
e333e78
remove duplicate test from yaml file
Samiul-TheSoccerFan May 2, 2025
06caf66
Adding support to exclude semantic_text subfields from mapper builders
Samiul-TheSoccerFan May 6, 2025
5572275
Adding support for generic field types
Samiul-TheSoccerFan May 8, 2025
785e8d6
refactoring to use builder and setting exclude value from semantic_te…
Samiul-TheSoccerFan May 8, 2025
9e65cd0
update in semantic_text mapper and fetcher to incorporate the support…
Samiul-TheSoccerFan May 8, 2025
56fb75c
Fix code style issue
Samiul-TheSoccerFan May 8, 2025
232879b
adding node feature for yaml tests
Samiul-TheSoccerFan May 8, 2025
f2e5dae
Merge branch 'main' into exclude-subfields-for-field-caps-api
Samiul-TheSoccerFan May 8, 2025
f75d236
Adding more restrictive checks on yaml tests and few refactoring
Samiul-TheSoccerFan May 8, 2025
eb9b56b
Merge branch 'main' into exclude-subfields-for-field-caps-api
Samiul-TheSoccerFan May 8, 2025
1e53918
Merge branch 'main' into exclude-subfields-for-field-caps-api
elasticmachine May 9, 2025
811adcf
Returns metadata fields from metadata mappers
Samiul-TheSoccerFan May 22, 2025
25f8bca
returns all source fields for fieldcaps
Samiul-TheSoccerFan May 22, 2025
b549efc
gather all fields and iterate to process for fieldcaps api
Samiul-TheSoccerFan May 22, 2025
1b25b4a
revert back all changes from MappedFieldtype and subclasses
Samiul-TheSoccerFan May 22, 2025
969e9bf
revert back exclude logic from semantic_text mapper
Samiul-TheSoccerFan May 22, 2025
7da5bd6
fix lint issues
Samiul-TheSoccerFan May 22, 2025
a64c65f
fix lint issues
Samiul-TheSoccerFan May 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/127664.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 127664
summary: Exclude `semantic_text` subfields from field capabilities API
area: "Mapping"
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@
import org.elasticsearch.core.Nullable;
import org.elasticsearch.index.IndexService;
import org.elasticsearch.index.engine.Engine;
import org.elasticsearch.index.mapper.KeywordFieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.RuntimeField;
import org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper;
import org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.SearchExecutionContext;
Expand Down Expand Up @@ -149,6 +152,18 @@ private FieldCapabilitiesIndexResponse doFetch(
return new FieldCapabilitiesIndexResponse(shardId.getIndexName(), indexMappingHash, responseMap, true, indexMode);
}

/**
* Returns true if the field should be excluded from the field capabilities response.
* This is used to exclude fields that are not useful for the user, such as
* offset_source and inference chunk embeddings.
*/
private static boolean shouldExcludeField(MappedFieldType ft) {
return ft.typeName().equals("offset_source")
|| ((ft instanceof SparseVectorFieldMapper.SparseVectorFieldType
|| ft instanceof DenseVectorFieldMapper.DenseVectorFieldType
|| ft instanceof KeywordFieldMapper.KeywordFieldType) && ft.name().contains(".inference.chunks"));
}

static Map<String, IndexFieldCapabilities> retrieveFieldCaps(
SearchExecutionContext context,
Predicate<String> fieldNameFilter,
Expand All @@ -173,7 +188,8 @@ static Map<String, IndexFieldCapabilities> retrieveFieldCaps(
MappedFieldType ft = entry.getValue();
if ((includeEmptyFields || ft.fieldHasValue(fieldInfos))
&& (fieldPredicate.test(ft.name()) || context.isMetadataField(ft.name()))
&& (filter == null || filter.test(ft))) {
&& (filter == null || filter.test(ft))
&& shouldExcludeField(ft) == false) {
IndexFieldCapabilities fieldCap = new IndexFieldCapabilities(
field,
ft.familyTypeName(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,26 @@ setup:
- not_exists: fields.dense_field
- match: { fields.sparse_field.text.searchable: true }

---
"Field caps exclude chunks and embedding fields":
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0

- do:
field_caps:
include_empty_fields: true
index: test-index
fields: "*"

- match: { indices: [ "test-index" ] }
- exists: fields.sparse_field
- exists: fields.dense_field
- not_exists: fields.sparse_field.chunks.embeddings
- not_exists: fields.sparse_field.chunks.embeddings.offsets
- not_exists: fields.dense_field.chunks.embeddings
- not_exists: fields.dense_field.chunks.embeddings.offsets

---
"Indexes dense vector document":
# Checks mapping is not updated until first doc arrives
Expand Down Expand Up @@ -359,3 +379,23 @@ setup:
index: test-always-include-inference-id-index

- exists: test-always-include-inference-id-index.mappings.properties.semantic_field.inference_id

---
"Field caps exclude chunks and embedding fields":
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define a new cluster feature? As per my understanding, these fields are not expected from field_caps API so excluding these should not have an impact on the API level or discover. We have also covered backward compatibility through other yaml file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to create a test feature for these tests.


- do:
field_caps:
include_empty_fields: true
index: test-index
fields: "*"

- match: { indices: [ "test-index" ] }
- exists: fields.sparse_field
- exists: fields.dense_field
- not_exists: fields.sparse_field.chunks.embeddings
- not_exists: fields.sparse_field.chunks.offset
- not_exists: fields.dense_field.chunks.embeddings
- not_exists: fields.dense_field.chunks.offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be *.inference.chunks.*?

Also, can we check that field caps excludes inference and inference.chunks here too?

Original file line number Diff line number Diff line change
Expand Up @@ -307,3 +307,22 @@ setup:
another_field:
type: keyword

---
"Field caps exclude chunks embedding and text fields":
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0

- do:
field_caps:
include_empty_fields: true
index: test-index
fields: "*"

- match: { indices: [ "test-index" ] }
- exists: fields.sparse_field
- exists: fields.dense_field
- not_exists: fields.sparse_field.inference.chunks.embeddings
- not_exists: fields.sparse_field.inference.chunks.text
- not_exists: fields.dense_field.inference.chunks.embeddings
- not_exists: fields.dense_field.inference.chunks.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check that field caps excludes inference and inference.chunks here too?

Loading