Skip to content

Experiment using ignored source for fields with no doc values or stored fields. #114886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Oct 16, 2024

A POC that tries to use ignored source as fall back if synthetic source is enabled and a field is neither stored or has docv values enabled.

This should be more efficient compared to using synthetic source in block loaders, since we will not potentially read many doc values values / stored fields twice to synthesize the source.

…ed fields.

A POC that tries to use ignored source as fall back if synthetic source is enabled and a field is neither stored or has docv values enabled.

This looks to be easier than fully supporting synthetic source in block loaders (pushing down source loader at this level). And is also more efficient, since we will not load doc values / stored fields we don't need.
@martijnvg martijnvg added :Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Mapping The storage related side of mappings labels Oct 16, 2024
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Oct 16, 2024
…ock loaders.

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes `_ignored_source` field as a required stored field.

Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886
martijnvg added a commit that referenced this pull request Oct 18, 2024
…ReaderOperator via BlockSourceReader. (#114903)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Oct 18, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
elasticsearchmachine pushed a commit that referenced this pull request Oct 18, 2024
…ReaderOperator via BlockSourceReader. (#114903) (#115064)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
lkts pushed a commit to lkts/elasticsearch that referenced this pull request Oct 18, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
…ReaderOperator via BlockSourceReader. (elastic#114903)

Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in elastic#114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Mapping The storage related side of mappings v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants