Skip to content

Suggester breaks on large corpus for Solr 7.4 #1972

Open
@cdmo

Description

@cdmo

We recently began experimenting with BL7, Solr 7.4, Rails 5.2 and a 7 million item catalog. In the early testing so far we've noticed that the suggester feature errors out. The Solr web UI log screen will look like this:

ERROR true
x:blacklight-core
SuggestComponent
Exception in building suggester index for: mySuggester

Looking into the logs, the full trace looks like this:

2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: default
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: author
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: subject
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: title
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SuggestComponent buildOnCommit: mySuggester
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
2018-09-13 19:40:12.818 ERROR (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SuggestComponent Exception in building suggester index for: mySuggester
java.lang.IllegalArgumentException: input automaton is too large: 1001
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1298) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]

( this goes one for 1000 lines)

        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStates(Operations.java:1275) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:292) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.toAutomaton(AnalyzingSuggester.java:854) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:430) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:181) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.handler.component.SuggestComponent$SuggesterListener.buildSuggesterIndex(SuggestComponent.java:534) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.handler.component.SuggestComponent$SuggesterListener.newSearcher(SuggestComponent.java:521) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.core.SolrCore.lambda$getSearcher$18(SolrCore.java:2322) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2018-09-13 19:40:12.824 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.c.SolrCore [blacklight-core] Registered new searcher Searcher@b5ea2d0[blacklight-core] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_gy(7.4.0):C2138244/151:delGen=3) Uninverting(_k0(7.4.0):C2025787/58:delGen=2) Uninverting(_if(7.4.0):c125682/2:delGen=1) Uninverting(_mk(7.4.0):C2086012/4:delGen=2) Uninverting(_k8(7.4.0):C10433/38:delGen=2) Uninverting(_l1(7.4.0):C8823/2:delGen=2) Uninverting(_lj(7.4.0):C16528/1:delGen=1) Uninverting(_lk(7.4.0):C17041/1:delGen=1) Uninverting(_lx(7.4.0):C16192) Uninverting(_m0(7.4.0):C17514) Uninverting(_mw(7.4.0):c248023) Uninverting(_mc(7.4.0):C18352/1:delGen=1) Uninverting(_n7(7.4.0):c255323) Uninverting(_mx(7.4.0):C16296/3:delGen=1) Uninverting(_mu(7.4.0):C17043/2:delGen=1) Uninverting(_n0(7.4.0):C20258/2:delGen=1) Uninverting(_n5(7.4.0):C16103/1:delGen=1) Uninverting(_n6(7.4.0):C13497) Uninverting(_n3(7.4.0):C13603) Uninverting(_n8(7.4.0):C4865) Uninverting(_o8(7.4.0):c575/50:delGen=14) Uninverting(_nv(7.4.0):C9761/51:delGen=21) Uninverting(_nw(7.4.0):C5308/51:delGen=18) Uninverting(_p2(7.4.0):c1175/2:delGen=2) Uninverting(_p0(7.4.0):C1364/4:delGen=1) Uninverting(_p3(7.4.0):C97/2:delGen=1) Uninverting(_p4(7.4.0):C51/1:delGen=1) Uninverting(_p5(7.4.0):C98)))}
2018-09-13 19:40:12.825 INFO  (qtp817348612-15) [   x:blacklight-core] o.a.s.u.p.LogUpdateProcessorFactory [blacklight-core]  webapp=/solr path=/update/json params={commit=true}{commit=} 0 572

So, it appears that there is a recursive function that is only permitted to run 1,000 times before it quits. This is, apparently, a change that occurred in Lucene 7.0. Here's an email thread that discusses it:

http://lucene.472066.n3.nabble.com/solr-5-2-gt-7-2-suggester-failure-td4383551.html

Here's the relevant commit (as pointed out in the email thread):

apache/lucene-solr@7dde798

I'll mention too that the suggester worked fine for us with a small corpus. It appears that once the corpus becomes too large, the suggester struggles. Also, all of our settings in schema.xml and solrconfig.xml are default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions