Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17310: Configurable LeafSorter to customize segment search order #2477

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions solr/core/src/java/org/apache/solr/update/LeafSorterSupplier.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.solr.update;

import java.util.Comparator;
import org.apache.lucene.index.LeafReader;

public interface LeafSorterSupplier {
Comparator<LeafReader> getLeafSorter();
}
27 changes: 27 additions & 0 deletions solr/core/src/java/org/apache/solr/update/SegmentSort.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.solr.update;

public enum SegmentSort {
/** No segment sort */
NONE,
/** Sort leaf reader by segment creation time ascending order */
TIME_ASC,
/** Sort leaf reader by segment creation time descending order */
TIME_DESC
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.solr.update;

import java.lang.invoke.MethodHandles;
import java.util.Comparator;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.SegmentReader;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

final class SegmentTimeLeafSorterSupplier implements LeafSorterSupplier {
private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private static final String TIME_FIELD = "timestamp";
private static final SegmentSort DEFAULT_SORT_OPTIONS = SegmentSort.NONE;

private SegmentSort sortOptions;
private Comparator<LeafReader> leafSorter;

public SegmentTimeLeafSorterSupplier() {
this(DEFAULT_SORT_OPTIONS);
}

public SegmentTimeLeafSorterSupplier(SegmentSort sortOptions) {
this.sortOptions = sortOptions;
}

@Override
public Comparator<LeafReader> getLeafSorter() {
if (leafSorter == null) {
if (SegmentSort.NONE == sortOptions) {
return null;
}
boolean ascSort = SegmentSort.TIME_ASC == sortOptions;
long missingValue = ascSort ? Long.MAX_VALUE : Long.MIN_VALUE;
leafSorter =
Comparator.comparingLong(
r -> {
try {
return Long.parseLong(
((SegmentReader) r).getSegmentInfo().info.getDiagnostics().get(TIME_FIELD));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of the diagnostics here seems very specialised and potentially fragile. Leaf sorting is "between segment sorting" and we also have index sorting i.e. "within segment sorting" -- I wonder if there might be enough commonality to generalise. Will add more detailed scribbles on the https://issues.apache.org/jira/browse/SOLR-17310 ticket itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been checking the available sources but currently the timestamp is only in the segment info diagnostics.

} catch (Exception e) {
log.error("Error getting time stamp for SegmentReader", e);
return missingValue;
}
});
return ascSort ? leafSorter : leafSorter.reversed();
}
;
return leafSorter;
}

public SegmentSort getSortOptions() {
return sortOptions;
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder(50);
sb.append("SegmentTimeLeafSorter{").append(sortOptions).append('}');
return sb.toString();
}
}
24 changes: 24 additions & 0 deletions solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,14 @@
import java.io.IOException;
import java.lang.invoke.MethodHandles;
import java.util.Collections;
import java.util.Comparator;
import java.util.Map;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
import org.apache.lucene.index.ConcurrentMergeScheduler;
import org.apache.lucene.index.IndexWriter.IndexReaderWarmer;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.MergePolicy;
import org.apache.lucene.index.MergeScheduler;
import org.apache.lucene.search.Sort;
Expand Down Expand Up @@ -69,6 +71,8 @@ public class SolrIndexConfig implements MapSerializable {
public final double ramBufferSizeMB;
public final int ramPerThreadHardLimitMB;

public String segmentSort;

/**
* When using a custom merge policy that allows triggering synchronous merges on commit (see
* {@link MergePolicy#findFullFlushMerges(org.apache.lucene.index.MergeTrigger,
Expand Down Expand Up @@ -106,6 +110,7 @@ private SolrIndexConfig() {
mergePolicyFactoryInfo = null;
mergeSchedulerInfo = null;
mergedSegmentWarmerInfo = null;
segmentSort = null;
// enable coarse-grained metrics by default
metricsInfo = new PluginInfo("metrics", Collections.emptyMap(), null, null);
}
Expand Down Expand Up @@ -152,6 +157,7 @@ public SolrIndexConfig(ConfigNode cfg, SolrIndexConfig def) {
maxBufferedDocs = get("maxBufferedDocs").intVal(def.maxBufferedDocs);
ramBufferSizeMB = get("ramBufferSizeMB").doubleVal(def.ramBufferSizeMB);
maxCommitMergeWaitMillis = get("maxCommitMergeWaitTime").intVal(def.maxCommitMergeWaitMillis);
segmentSort = get("segmentSort").txt(def.segmentSort);

// how do we validate the value??
ramPerThreadHardLimitMB = get("ramPerThreadHardLimitMB").intVal(def.ramPerThreadHardLimitMB);
Expand Down Expand Up @@ -208,6 +214,9 @@ public Map<String, Object> toMap(Map<String, Object> map) {
map.put("writeLockTimeout", writeLockTimeout);
map.put("lockType", lockType);
map.put("infoStreamEnabled", infoStream != InfoStream.NO_OUTPUT);
if (segmentSort != null) {
map.put("segmentSort", segmentSort);
}
if (mergeSchedulerInfo != null) {
map.put("mergeScheduler", mergeSchedulerInfo);
}
Expand Down Expand Up @@ -285,6 +294,21 @@ public IndexWriterConfig toIndexWriterConfig(SolrCore core) throws IOException {
iwc.setMergedSegmentWarmer(warmer);
}

if (segmentSort != null) {
try {
SegmentSort sortEnum = SegmentSort.valueOf(segmentSort);
LeafSorterSupplier sorter = new SegmentTimeLeafSorterSupplier(sortEnum);
Comparator<LeafReader> leafSorter = sorter.getLeafSorter();
if (leafSorter != null) {
iwc.setLeafSorter(leafSorter);
if (log.isDebugEnabled()) {
log.debug("Segment sort enabled: {}", sorter);
}
}
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException("Invalid segmentSort option: " + segmentSort);
}
}
return iwc;
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version="1.0" ?>

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<config>
<luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.RAMDirectoryFactory}"/>
<schemaFactory class="ClassicIndexSchemaFactory"/>

<indexConfig>
<segmentSort>TIME_DESC</segmentSort>
</indexConfig>

</config>
32 changes: 32 additions & 0 deletions solr/core/src/test/org/apache/solr/update/SolrIndexConfigTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ public class SolrIndexConfigTest extends SolrTestCaseJ4 {
"solrconfig-concurrentmergescheduler.xml";
private static final String solrConfigFileNameSortingMergePolicyFactory =
"solrconfig-sortingmergepolicyfactory.xml";
private static final String solrConfigFileNameSegmentSort = "solrconfig-segmentsort.xml";
private static final String schemaFileName = "schema.xml";

private static boolean compoundMergePolicySort = false;
Expand Down Expand Up @@ -266,4 +267,35 @@ public void testMaxCommitMergeWaitTime() throws Exception {
assertEquals(
10, sc.indexConfig.toIndexWriterConfig(h.getCore()).getMaxFullFlushMergeWaitMillis());
}

/*
No leaf sorter configuration
*/

public void testNoneSegmentSort() throws Exception {
SolrConfig solrConfig =
new SolrConfig(instanceDir, solrConfigFileNameSortingMergePolicyFactory);
SolrIndexConfig solrIndexConfig = new SolrIndexConfig(solrConfig, null);
assertNotNull(solrIndexConfig);
assertNull(solrIndexConfig.segmentSort);
IndexWriterConfig iwc = solrIndexConfig.toIndexWriterConfig(h.getCore());
assertNull(iwc.getLeafSorter());
}

/*
Leaf sorter configuration to sort by segment timestamp
*/

public void testSegmentSort() throws Exception {
SolrConfig solrConfig = new SolrConfig(instanceDir, solrConfigFileNameSegmentSort);
SolrIndexConfig solrIndexConfig = new SolrIndexConfig(solrConfig, null);
assertNotNull(solrIndexConfig);
assertNotNull(solrIndexConfig.segmentSort);
IndexWriterConfig iwc = solrIndexConfig.toIndexWriterConfig(h.getCore());
assertNotNull(iwc.getLeafSorter());
SegmentTimeLeafSorterSupplier expected =
new SegmentTimeLeafSorterSupplier(SegmentSort.valueOf(solrIndexConfig.segmentSort));
assertEquals(expected.getSortOptions(), SegmentSort.TIME_DESC);
assertEquals(expected.getLeafSorter().getClass(), iwc.getLeafSorter().getClass());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,25 @@ This is not required for near real-time search, but will reduce search latency o
<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>
----

=== segmentSort

The segmentSort configuration allows the use of a comparator for sorting Lucene leaf readers.
When a DirectoryReader is opened, its leaf readers are sorted according to the specified comparator.

By default, no comparator is enabled. The currently supported sort options are:

* TIME_ASC: Sorts the leaf readers by the segment's timestamp in ascending order.
* TIME_DESC: Sorts the leaf readers by the segment's timestamp in descending order.


.Example:

[source,xml]
----
<segmentSort>TIME_DESC</segmentSort>
----
The above setting prioritizes matches in recently generated segments.

== Compound File Segments

Each Lucene segment is typically comprised of a dozen or so files.
Expand Down
Loading