Skip to content

OAK-11603: lucene 4.x fuzzy queries don't work in Elastic#2180

Merged
fabriziofortino merged 2 commits intoapache:trunkfrom
fabriziofortino:OAK-11603
Mar 14, 2025
Merged

OAK-11603: lucene 4.x fuzzy queries don't work in Elastic#2180
fabriziofortino merged 2 commits intoapache:trunkfrom
fabriziofortino:OAK-11603

Conversation

@fabriziofortino
Copy link
Contributor

No description provided.

@alagodasii
Copy link
Contributor

Tests in this repo fail very frequently. Here again. Should it be a warning for us or we can ignore it?

Copy link
Contributor

@nfsantos nfsantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log a warning if we find a fuzzy query with the old format? Maybe the log could be throttled, printed only once per run or only once every x times. This could help in migrating to the new fuzzy format.

I think the PR is missing some tests (or do we already have these tests?)

  • The ~ character appears in the query but not as part of a fuzzy match.
  • The query contains several fuzzy match expressions.
  • The query contains a mix of old and new style fuzzy matches.

@fabriziofortino
Copy link
Contributor Author

@alagodasii CI in this repo fails mostly because it's running with low resources. If you have failures while building the repo locally, please report them.

@fabriziofortino
Copy link
Contributor Author

fabriziofortino commented Mar 14, 2025

@nfsantos

Could we log a warning if we find a fuzzy query with the old format? Maybe the log could be throttled, printed only once per run or only once every x times. This could help in migrating to the new fuzzy format.

I added a log.trace call when this conversion happens. I don't think it would be correct to have a warning here because, so far, there is no plan of deprecating/migrating lucene in favour of elasticsearch or a more recent version of lucene. Queries executed against an Elastic index have to be backward compatible with Lucene queries based on version 4.x but not the other way around. A query with roam~2 would work on Elastic only. We cannot change Lucene to be forward compatible with more recent versions.

I think the PR is missing some tests (or do we already have these tests?)

  • The ~ character appears in the query but not as part of a fuzzy match.
  • The query contains several fuzzy match expressions.
  • The query contains a mix of old and new style fuzzy matches.

I added all these cases.

@fabriziofortino fabriziofortino merged commit 53c1964 into apache:trunk Mar 14, 2025
1 of 3 checks passed
@fabriziofortino fabriziofortino deleted the OAK-11603 branch March 14, 2025 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants