fix(indexing): Fix indexing full text terms to support exact match; fix isbn seach term processor#888
Merged
viacheslavkol merged 7 commits intomasterfrom Feb 6, 2026
Merged
Conversation
…ix isbn seach term processor - Change instance folio_word_delimiter_graph to catenate_all - Preserve trailing asterisc in IsbnSearchTermProcessor Closes: MSEARCH-1011
…_all` for whole `word_delimiter_graph`
psmagin
approved these changes
Feb 6, 2026
SvitlanaKovalova1
approved these changes
Feb 6, 2026
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Purpose
Fix indexing full text terms to support exact match; fix isbn seach term processor
Approach
Changes Checklist
Related Issues
MSEARCH-1011
Learning and Resources (if applicable)
Problem with
047144250Xterm is that it contains numbers + letters so it gets indexed into terms047144250andxand is unsearchable bytermquery for value047144250x.Problem with asterisks is that normalization in processor might remove them.
catenate_allas opposed tocatenate_wordswill increase index size since it'll catenate numbers and numbers+words and require full reindexing. However this also should fix other searches where exact match is performed on a full-text index.We might just consider using different type for indexing isbn if it doesn't break any other requirements.
Adding just
catenate_allfixes all cases except401,"isbn = ""{value}""",9781609383657*when we have the full term and asterisc in the end.Adding just a fix to the search processor without catenate all will have these cases still failing:
397,"isbn = ""{value}""",047144250X*398,"isbn == ""{value}""",047144250X