-
Notifications
You must be signed in to change notification settings - Fork 381
Support queries where part of the query is using ExactSettings #4973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
demiankatz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @damien-git. I confess that I haven't devoted particularly deep thought to the actual processing logic (though I at least read through it)... but here's a review consisting of some nitpicky suggestions as well as some higher-level thinking about the solution.
module/VuFindSearch/src/VuFindSearch/Backend/Solr/QueryBuilder.php
Outdated
Show resolved
Hide resolved
| * @param QueryGroup|Query $query User query | ||
| * | ||
| * @return QueryGroup|Query | ||
| */ | ||
| protected function possiblyConvertMixedExactQueryIntoAdvanced($query) | ||
| { | ||
| if ($query instanceof QueryGroup) { | ||
| return $query; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to use QueryInterface for greater flexibility, and have more explicit types. Maybe something like:
| * @param QueryGroup|Query $query User query | |
| * | |
| * @return QueryGroup|Query | |
| */ | |
| protected function possiblyConvertMixedExactQueryIntoAdvanced($query) | |
| { | |
| if ($query instanceof QueryGroup) { | |
| return $query; | |
| } | |
| * @param QueryInterface $query User query | |
| * | |
| * @return QueryInterface | |
| */ | |
| protected function possiblyConvertMixedExactQueryIntoAdvanced(QueryInterface $query): QueryInterface | |
| { | |
| if (!($query instanceof Query)) { | |
| return $query; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I was using AbstractQuery, which is used in the calling code and implements QueryInterface. But my editor (with the intelephense extension) was complaining that it doesn't define getHandler() (which is only in Query). I tried for a while to do a cast and use a different variable with the Query type specified in an inline comment. It worked but it was ugly. Then I thought of using QueryGroup|Query. The type checker is smart enough to realize that if the initial type is QueryGroup|Query and if QueryGroup objects are discarded, it only leaves the Query type, and it no longer complains about the method call. I like this solution better, because it clarifies the expected types in the function definition.
Actually, now that I think about it, I have a better option: use QueryInterface or AbstractQuery, and return the query if it's not an instance of Query. I am not sure what WorkKeysQuery is for, but it would better handle that case. Note that there are similar issues with query types elsewhere in QueryBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that's why I suggested the !($query instanceof Query) check at the top of the method.
It may be worthwhile to invest some effort into cleaning up the other type issues you mention as a dev-12.0 PR (I'm sure problems have accumulated over time as the code has evolved), but I'll leave it to you whether or not now is the time for that. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go with your solution, just using AbstractQuery instead of QueryInterface to be more consistent with the rest of the code in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since AbstractQuery implements QueryInterface, I think using QueryInterface would be preferable as it is a more general option. (In practice, I think it likely makes no real difference one way or the other -- but using the interface feels better from a design perspective).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, if you think it's better to use AbstractQuery for now, and then follow up with a dev-12.0 PR that switches all AbstractQuery references to QueryInterface for consistency, I could live with that approach. :-)
| $tests = [ | ||
| ['"t1"', '"t1"'], // simple exact queries are not affected | ||
| ['("t1" OR t2) AND t3', '("t1" OR t2) AND t3'], // queries with parenthesis are not supported | ||
| ['"t1" AND title:t2', '"t1" AND title:t2'], // queries with field are not supported | ||
| ['"t1" AND "t2"', '"t1" AND "t2"'], // queries with multiple exact parts are not supported | ||
| ['t1 AND "t2" AND t3', 't1 AND "t2" AND t3'], // queries with an exact part in the middle are not supported | ||
| ['"t1" t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'], | ||
| ['"t1" AND t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'], | ||
| ['"t1" OR t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") OR ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'], | ||
| ['t1 AND "t2"', '((_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t1") AND ' . | ||
| '(_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t2\""))'], | ||
| ['NOT "t1" AND t2', '((*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\""))) AND ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'], | ||
| ['t1 AND NOT "t2"', '((_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t1 AND") AND ' . | ||
| '(*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t2\""))))'], | ||
| ['-"t1" t2', '((*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\""))) AND ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'], | ||
| ['"t1" AND t2 AND t3', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' . | ||
| '(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2 AND t3"))'], // would be different with dismax | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this part be better handled as a separate test using a data provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is consistent with testNormalization(). Are you thinking of something like getQuestionTests() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, these existing tests are very old and not using best practices. Even getQuestionTests should be refactored to use a proper dataProvider. Would it be helpful for me to open a PR to modernize the existing tests so you have a model to work from here? I can probably find time for that tomorrow if it would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #4981 as a demonstration of the sort of change I had in mind.
module/VuFindSearch/src/VuFindSearch/Backend/Solr/QueryBuilder.php
Outdated
Show resolved
Hide resolved
Co-authored-by: Demian Katz <[email protected]>
Exact queries (surrounded with quotes) are very useful to eliminate issues with stemming. But they don't work when combined with non-exact queries. Some examples:
translate AND "illustrate": matches records withillustrated"robot" AND illustrate: matches records withrobotics"covidence" AND review*: matches records withCOVIDThis PR proposes a solution for these cases. The implementation idea is to transform these queries into advanced queries, which use embedded Solr queries. Currently a workaround is to build equivalent queries as advanced queries in the UI.
It does not support multiple exact strings, or queries using field names (with '
:') or with parenthesis. Logical operators are supported, includingNOTfor the exact part. The implementation could be improved in the future, but reported cases from our users did not include more complex queries.