-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Adding support to exclude semantic_text subfields #127664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding support to exclude semantic_text subfields #127664
Conversation
Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you. |
...e/src/yamlRestTest/resources/rest-api-spec/test/inference/10_semantic_text_field_mapping.yml
Outdated
Show resolved
Hide resolved
- requires: | ||
cluster_features: "gte_v8.16.0" | ||
reason: field_caps support for semantic_text added in 8.16.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define a new cluster feature? As per my understanding, these fields are not expected from field_caps
API so excluding these should not have an impact on the API level or discover. We have also covered backward compatibility through other yaml
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to create a test feature for these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @Mikep86 's comments in Slack, but good start!
- requires: | ||
cluster_features: "gte_v8.16.0" | ||
reason: field_caps support for semantic_text added in 8.16.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to create a test feature for these tests.
server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesFetcher.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better! We need to tweak the approach a bit but this is getting closer.
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/OffsetSourceFieldMapper.java
Outdated
Show resolved
Hide resolved
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Pinging @elastic/search-eng (Team:SearchOrg) |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just some nits and tests to clean up
@@ -448,10 +455,12 @@ public KeywordFieldMapper build(MapperBuilderContext context) { | |||
indexCreatedVersion, | |||
IndexVersions.SYNTHETIC_SOURCE_STORE_ARRAYS_NATIVELY_KEYWORD | |||
); | |||
|
|||
KeywordFieldType keywordFieldType = buildFieldType(context, fieldtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can call buildFieldType(context, fieldtype)
in-line when creating KeywordFieldMapper
, the command stays short and readable
/** | ||
* @return true if fieldType is subfields of semantic_text type | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation should be more generic. Semantic text is only one case where we want to exclude a field from field caps, there could be others in the future.
/** | |
* @return true if fieldType is subfields of semantic_text type | |
*/ | |
/** | |
* @return true if the field should be excluded from field caps | |
*/ |
@@ -83,18 +84,25 @@ public Builder setStored(boolean value) { | |||
return this; | |||
} | |||
|
|||
public Builder setExcludeFromFieldCaps(boolean value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we change this to excludeFromFieldCaps
to match the other builders?
@@ -134,6 +134,9 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie | |||
public static final NodeFeature SEMANTIC_TEXT_SKIP_INFERENCE_FIELDS = new NodeFeature("semantic_text.skip_inference_fields"); | |||
public static final NodeFeature SEMANTIC_TEXT_BIT_VECTOR_SUPPORT = new NodeFeature("semantic_text.bit_vector_support"); | |||
public static final NodeFeature SEMANTIC_TEXT_SUPPORT_CHUNKING_CONFIG = new NodeFeature("semantic_text.support_chunking_config"); | |||
public static final NodeFeature SEMANTIC_TEXT_SUB_FIELDS_EXCLUDE_FROM_FIELD_CAPS = new NodeFeature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: SEMANTIC_TEXT_EXCLUDE_SUB_FIELDS_FROM_FIELD_CAPS
matches the node feature name better
- not_exists: fields.sparse_field.chunks.embeddings | ||
- not_exists: fields.sparse_field.chunks.offset | ||
- not_exists: fields.dense_field.chunks.embeddings | ||
- not_exists: fields.dense_field.chunks.offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be *.inference.chunks.*
?
Also, can we check that field caps excludes inference
and inference.chunks
here too?
"Field caps exclude chunks and embedding fields": | ||
- requires: | ||
cluster_features: "semantic_text.exclude_sub_fields_from_field_caps" | ||
reason: field caps api exclude semantic_text subfields from 9.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.19.0 & 9.1.0
"Field caps exclude chunks embedding and text fields": | ||
- requires: | ||
cluster_features: "semantic_text.exclude_sub_fields_from_field_caps" | ||
reason: field caps api exclude semantic_text subfields from 9.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.19.0 & 9.1.0
- not_exists: fields.sparse_field.inference.chunks.embeddings | ||
- not_exists: fields.sparse_field.inference.chunks.text | ||
- not_exists: fields.dense_field.inference.chunks.embeddings | ||
- not_exists: fields.dense_field.inference.chunks.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check that field caps excludes inference
and inference.chunks
here too?
var chunkTextField = new KeywordFieldMapper.Builder(TEXT_FIELD, indexVersionCreated).indexed(false) | ||
.docValues(false) | ||
.excludeFromFieldCaps(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi Do we need index version checks around setting this new flag on the mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements! No additional feedback other than what Mike already flagged.
@elasticmachine update branch |
Thanks for working on this. Could we be explicit in the description of this PR around the motivations behind this change? There is no opt-in or opt-out, hence we are just declaring that these fields are not useful in the field caps output at all times? I am guessing that the issue is around Kibana showing these in Discover? Have we considered alternatives? |
The subfields we are excluding are Hiding these fields from field caps codifies that these are implementation details at the API level, creating a consistent experience in this respect for Kibana and API users. Are there alternatives you're aware of that would achieve the same result? Edit: Another thing to point out here is that these subfields are not visible in the mappings returned to the user (on purpose), further anchoring them as implementation details |
I talked with @jimczi and we have an alternative implementation that does not involve adding a flag to |
Update the
fieldCaps
API to excludesemantic_text
subfields in both legacy and new formats.Legacy format:
setup:
Query:
Response before update (Skimmed):
Response after update (Skimmed):
new format:
setup:
Query:
Response before update (Skimmed):
Response after update (Skimmed):