Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17310: Configurable LeafSorter to customize segment search order #2477

Closed
wants to merge 11 commits into from

Conversation

weiwang19
Copy link
Contributor

@weiwang19 weiwang19 commented May 24, 2024

https://issues.apache.org/jira/browse/SOLR-17310

Description

Lucene's IndexWriterConfig provides the option to sort leaf readers when a custom LeafSorter is provided. The functionality is currently not directly exposed in Solr. There are cases where we would like to customize the segment visit order, for example, visit the recently updated segments first when early termination is applied.

Solution

The SegmentTimeLeafSorter sorts the LeafReaders by time stamp, in ascending or descending order. It can be enabled by adding the segmentSort config in solrconfig.xml. Without the config, no sorting is applied by default.

Tests

Added unit test and verified all tests pass.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@weiwang19 weiwang19 changed the title SOLR-17310: Configurable LeafSorter for customized segment traverse order SOLR-17310: Configurable LeafSorter to customized segment search order May 24, 2024
@weiwang19 weiwang19 changed the title SOLR-17310: Configurable LeafSorter to customized segment search order SOLR-17310: Configurable LeafSorter to customize segment search order May 24, 2024
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments added

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 30, 2024
@weiwang19 weiwang19 requested a review from dsmiley June 10, 2024 18:24
@dsmiley dsmiley requested a review from cpoerschke June 10, 2024 21:25
@dsmiley
Copy link
Contributor

dsmiley commented Jun 10, 2024

Tagging @cpoerschke for review; I figure you're a better reviewer here. I haven't used this aspect of Lucene before.

r -> {
try {
return Long.parseLong(
((SegmentReader) r).getSegmentInfo().info.getDiagnostics().get(TIME_FIELD));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of the diagnostics here seems very specialised and potentially fragile. Leaf sorting is "between segment sorting" and we also have index sorting i.e. "within segment sorting" -- I wonder if there might be enough commonality to generalise. Will add more detailed scribbles on the https://issues.apache.org/jira/browse/SOLR-17310 ticket itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been checking the available sources but currently the timestamp is only in the segment info diagnostics.

Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity or converting it to draft will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Aug 23, 2024
@weiwang19 weiwang19 requested a review from cpoerschke October 2, 2024 21:45
@github-actions github-actions bot removed the stale PR not updated in 60 days label Oct 4, 2024
Copy link

github-actions bot commented Dec 3, 2024

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Dec 3, 2024
Copy link

github-actions bot commented Feb 2, 2025

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

@github-actions github-actions bot added the closed-stale Closed after being stale for 60 days label Feb 2, 2025
@github-actions github-actions bot closed this Feb 2, 2025
@weiwang19
Copy link
Contributor Author

keep it open

@epugh
Copy link
Contributor

epugh commented Feb 2, 2025

Seems like this ticket and #313 both have stalled out... Thoughts on a thread on the dev mailing list to see if we think moving forward with this or the #313 approach gets us moving?

@weiwang19
Copy link
Contributor Author

weiwang19 commented Feb 3, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:index closed-stale Closed after being stale for 60 days documentation Improvements or additions to documentation stale PR not updated in 60 days tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants