Skip to content

Jcrpaquin/bug/first author desugar#230

Open
sstults wants to merge 7 commits intoadsabs:mainfrom
sstults:jcrpaquin/bug/first-author-desugar
Open

Jcrpaquin/bug/first author desugar#230
sstults wants to merge 7 commits intoadsabs:mainfrom
sstults:jcrpaquin/bug/first-author-desugar

Conversation

@sstults
Copy link

@sstults sstults commented Mar 25, 2025

I added a quick wrapper around span queries to "mask" the field:

https://lucene.apache.org/core/9_2_0/queries/org/apache/lucene/queries/spans/FieldMaskingSpanQuery.html

This will get applied when author queries are getting turned into span queries.


if (q instanceof SpanQuery) {
return wrapBoost((SpanQuery) q, boost);
return wrapBoost(maskField((SpanQuery) q), boost);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about this code living in the SpanConverter. The SubQueryProvider was at least tangentially related to the main PR task and might(?) be easier to pipe config data to. Maybe we could promote this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's definitely cleaner. One way we could do that is send it a map of fields to mask. I do think it'll be hard to make this change later in the builder flow since we'd have to step through each child query of the result and reinstantiate each query class in turn.

BooleanQuery.class);

assertQueryEquals(req("q", "author:\"^acco*\""), "spanPosRange(SpanMultiTermQueryWrapper(author:acco*), 0, 1)", SpanPositionRangeQuery.class);
assertQueryEquals(req("q", "author:\"^acco*\""), "acco | acco*", BooleanQuery.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this asserting that the query should be an unfielded boolean query?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is probably an error. We might not be handling the combination of both ^ and * together properly. I'll look closer at it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertQueryEquals is removing the field. q.toString() actually returns "first_author:acco first_author:acco*"

req("defType", "aqp", "aqp.constant_scoring", "author^1", "aqp.classic_scoring.modifier", "0.6", "q",
"=author:(^accomazzi kurtz)"),
"FunctionScoreQuery(first_author:foo, scored by boost(sum(float(cite_read_boost),const(0.6))))",
"FunctionScoreQuery(+ConstantScore(spanPosRange(spanOr([author:accomazzi,, SpanMultiTermQueryWrapper(author:accomazzi,*)]), 0, 1)) +ConstantScore(author:kurtz,), scored by boost(sum(float(cite_read_boost),const(0.6))))",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is not correct; we would expect to see no spanPosRange here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a multi-term query, I think we have to convert it to SpanMultiTermQuery, otherwise we wouldn't be able to enforce the order of the terms.

@JCRPaquin JCRPaquin self-assigned this Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants