Skip to content

Conversation

@damien-git
Copy link
Contributor

@damien-git damien-git commented Dec 17, 2025

Exact queries (surrounded with quotes) are very useful to eliminate issues with stemming. But they don't work when combined with non-exact queries. Some examples:

  • translate AND "illustrate" : matches records with illustrated
  • "robot" AND illustrate : matches records with robotics
  • "covidence" AND review* : matches records with COVID

This PR proposes a solution for these cases. The implementation idea is to transform these queries into advanced queries, which use embedded Solr queries. Currently a workaround is to build equivalent queries as advanced queries in the UI.

It does not support multiple exact strings, or queries using field names (with ':') or with parenthesis. Logical operators are supported, including NOT for the exact part. The implementation could be improved in the future, but reported cases from our users did not include more complex queries.

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @damien-git. I confess that I haven't devoted particularly deep thought to the actual processing logic (though I at least read through it)... but here's a review consisting of some nitpicky suggestions as well as some higher-level thinking about the solution.

Comment on lines 221 to 229
* @param QueryGroup|Query $query User query
*
* @return QueryGroup|Query
*/
protected function possiblyConvertMixedExactQueryIntoAdvanced($query)
{
if ($query instanceof QueryGroup) {
return $query;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to use QueryInterface for greater flexibility, and have more explicit types. Maybe something like:

Suggested change
* @param QueryGroup|Query $query User query
*
* @return QueryGroup|Query
*/
protected function possiblyConvertMixedExactQueryIntoAdvanced($query)
{
if ($query instanceof QueryGroup) {
return $query;
}
* @param QueryInterface $query User query
*
* @return QueryInterface
*/
protected function possiblyConvertMixedExactQueryIntoAdvanced(QueryInterface $query): QueryInterface
{
if (!($query instanceof Query)) {
return $query;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was using AbstractQuery, which is used in the calling code and implements QueryInterface. But my editor (with the intelephense extension) was complaining that it doesn't define getHandler() (which is only in Query). I tried for a while to do a cast and use a different variable with the Query type specified in an inline comment. It worked but it was ugly. Then I thought of using QueryGroup|Query. The type checker is smart enough to realize that if the initial type is QueryGroup|Query and if QueryGroup objects are discarded, it only leaves the Query type, and it no longer complains about the method call. I like this solution better, because it clarifies the expected types in the function definition.
Actually, now that I think about it, I have a better option: use QueryInterface or AbstractQuery, and return the query if it's not an instance of Query. I am not sure what WorkKeysQuery is for, but it would better handle that case. Note that there are similar issues with query types elsewhere in QueryBuilder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's why I suggested the !($query instanceof Query) check at the top of the method.

It may be worthwhile to invest some effort into cleaning up the other type issues you mention as a dev-12.0 PR (I'm sure problems have accumulated over time as the code has evolved), but I'll leave it to you whether or not now is the time for that. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go with your solution, just using AbstractQuery instead of QueryInterface to be more consistent with the rest of the code in the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since AbstractQuery implements QueryInterface, I think using QueryInterface would be preferable as it is a more general option. (In practice, I think it likely makes no real difference one way or the other -- but using the interface feels better from a design perspective).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, if you think it's better to use AbstractQuery for now, and then follow up with a dev-12.0 PR that switches all AbstractQuery references to QueryInterface for consistency, I could live with that approach. :-)

Comment on lines 320 to 342
$tests = [
['"t1"', '"t1"'], // simple exact queries are not affected
['("t1" OR t2) AND t3', '("t1" OR t2) AND t3'], // queries with parenthesis are not supported
['"t1" AND title:t2', '"t1" AND title:t2'], // queries with field are not supported
['"t1" AND "t2"', '"t1" AND "t2"'], // queries with multiple exact parts are not supported
['t1 AND "t2" AND t3', 't1 AND "t2" AND t3'], // queries with an exact part in the middle are not supported
['"t1" t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'],
['"t1" AND t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'],
['"t1" OR t2', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") OR ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'],
['t1 AND "t2"', '((_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t1") AND ' .
'(_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t2\""))'],
['NOT "t1" AND t2', '((*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\""))) AND ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'],
['t1 AND NOT "t2"', '((_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t1 AND") AND ' .
'(*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t2\""))))'],
['-"t1" t2', '((*:* NOT ((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\""))) AND ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2"))'],
['"t1" AND t2 AND t3', '((_query_:"{!edismax qf=\"b\" mm=\\\'0%\\\'}\"t1\"") AND ' .
'(_query_:"{!edismax qf=\"a\" mm=\\\'0%\\\'}t2 AND t3"))'], // would be different with dismax
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this part be better handled as a separate test using a data provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is consistent with testNormalization(). Are you thinking of something like getQuestionTests() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, these existing tests are very old and not using best practices. Even getQuestionTests should be refactored to use a proper dataProvider. Would it be helpful for me to open a PR to modernize the existing tests so you have a model to work from here? I can probably find time for that tomorrow if it would be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #4981 as a demonstration of the sort of change I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants