seg single word and fix lucene error#836
Open
SophieMay wants to merge 2 commits intoinfinilabs:masterfrom
Open
seg single word and fix lucene error#836SophieMay wants to merge 2 commits intoinfinilabs:masterfrom
SophieMay wants to merge 2 commits intoinfinilabs:masterfrom
Conversation
Member
|
麻烦提供一个测试,谢谢。 |
Author
|
case:螺丝批及批头,使用了用户自定义词典 去掉注释后分词结果为: 螺丝批及批头 0-6 CN_WORD 分词结果存入lucene底层时,会对分词的startOffset进行校验,由于及的startoffset为3,比后面的批字startoffset(为2 )大lucene会报错startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards 优化主要调整分词结果的顺序 collection.sort 螺丝批及批头 0-6 CN_WORD 及字属于重复分词,对后续流程无影响,后续可进一步优化。 issue里的相同case: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1.字典中无单字,但是词元冲突了,切分出相交词元的前一个词元中的单字(去掉注解)
2.修复由1带来的lucene底层报错:startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards