-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Experiment using ignored source for fields with no doc values or stored fields. #114886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
martijnvg
wants to merge
1
commit into
elastic:main
Choose a base branch
from
martijnvg:block_loaders_synthetic_source_poc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Experiment using ignored source for fields with no doc values or stored fields. #114886
martijnvg
wants to merge
1
commit into
elastic:main
from
martijnvg:block_loaders_synthetic_source_poc
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ed fields. A POC that tries to use ignored source as fall back if synthetic source is enabled and a field is neither stored or has docv values enabled. This looks to be easier than fully supporting synthetic source in block loaders (pushing down source loader at this level). And is also more efficient, since we will not load doc values / stored fields we don't need.
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Oct 16, 2024
…ock loaders. Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes `_ignored_source` field as a required stored field. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886
martijnvg
added a commit
that referenced
this pull request
Oct 18, 2024
…ReaderOperator via BlockSourceReader. (#114903) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Oct 18, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Oct 18, 2024
…ReaderOperator via BlockSourceReader. (#114903) (#115064) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
lkts
pushed a commit
to lkts/elasticsearch
that referenced
this pull request
Oct 18, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
3 tasks
georgewallace
pushed a commit
to georgewallace/elasticsearch
that referenced
this pull request
Oct 25, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
jfreden
pushed a commit
to jfreden/elasticsearch
that referenced
this pull request
Nov 4, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903) Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields. This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source. Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/Compute Engine
Analytics in ES|QL
:StorageEngine/Mapping
The storage related side of mappings
v9.1.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A POC that tries to use ignored source as fall back if synthetic source is enabled and a field is neither stored or has docv values enabled.
This should be more efficient compared to using synthetic source in block loaders, since we will not potentially read many doc values values / stored fields twice to synthesize the source.