Skip to content

Enable a doc values sparse index on the timestamp field #121673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Feb 4, 2025

This PR introduces support for a sparse doc values index for the @timestamp field in DateFieldMapper when specific conditions are met:

  • The index mode is set to LOGSDB.
  • The field name is @timestamp and mapped as a date field.
  • The field is included in the primary sort configuration.
  • The field has doc values and indexing is not disabled explicitly (not index: false).

When all the conditions above hold true, we:

  • We use the corresponding Sorted Set doc values format with indexing enabled (DocValuesSkipIndexType.RANGE is used in Lucene as the only available sparse index at the moment).
  • Disable indexing of the @timestamp field, dropping the inverted index in favor of the sparse doc values index.

Some queries might experience slower performance as a result of using a doc values sparse index instead of an inverted index.

Disabling the inverted index on the @timestamp field while enabling the sparse doc values index is expected to:

  • Reduce the storage footprint depending on the size of the inverted index relative to the sparse index.
  • Improve indexing throughput by reducing the amount of data written during segment flushes.

@salvatore-campagna salvatore-campagna self-assigned this Feb 4, 2025
@salvatore-campagna salvatore-campagna added :StorageEngine/Logs You know, for Logs auto-backport Automatically create backport pull requests when merged labels Feb 4, 2025
final IndexSortConfig indexSortConfig,
final boolean hasDocValues
) {
if (index.isConfigured()) {
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work in case index: true explicitly because isSet is true but getValue() == getDefaultValue(). As in the other PR I think the implementation of isConfigured is not correct.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of @timestamp for isConfigured we have two possibilities:

  1. isConfigured() is true which means the user explicitly set the index value to true or false. In such case no matter the value of index the sparse doc values index should be disabled, and we would use the inverted index.
  2. isConfigured() is false: this means the index parameter is not set by default which means, for LogsDB, under certain conditions, we can use the sparse index instead of the inverted index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should then just check for isSet() ?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use isSet than the other test fails...for instance testFieldTypeWithSkipDocValues_LogsDBMode

@@ -170,6 +185,9 @@ public void doValidate(MappingLookup lookup) {
configuredSettings.remove("meta");
configuredSettings.remove("format");
configuredSettings.remove("locale");
if (isIndexParamExplicitOverrideAllowed()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I had to add the indexSettings

&& IndexMode.LOGSDB.equals(indexMode)
&& hasDocValues
&& indexSortConfig != null
&& indexSortConfig.hasPrimarySortOnField(DataStreamTimestampFieldMapper.DEFAULT_PATH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also allow when timestamp is secondary sort?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I left it like that for the moment because that was not really an issue with the tests I have at the moment. Anyway I will replace that method with something like isSortedOnTimestamp.

final IndexSortConfig indexSortConfig,
final boolean hasDocValues
) {
if (index.isConfigured()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should then just check for isSet() ?

@felixbarny
Copy link
Member

felixbarny commented Feb 5, 2025

  • The index mode is set to LOGSDB.

Can we extend this so that this is also enabled for TSDB?

  • The field name is @timestamp and mapped as a date field.

Can we extend this to also work for when @timestamp is mapped to date_nanos?

@martijnvg
Copy link
Member

Can we extends this so that this is also enabled for TSDB?
Can we extend this to also work for when @timestamp is mapped to date_nanos?

I think we can do this. The scope of this PR is enable sparse index in favor for indexed data structures, so that we can analyze the results in elastic/logs nightly benchmark. Note that everything is behind a feature flag. We can do this in a follow ups after we analyzed the nightly benchmark results. For tsdb, we like have a separate effort of checking how using sparse index effects querying (also on dimension fields).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :StorageEngine/Logs You know, for Logs v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants