Skip to content

汉字转拼音时,避免拼音被拆分为多个token不生效 #301

@idawwei

Description

@idawwei

Description

测试123EDF,避免拼音拆分多个token,期望效果“ceshi123EDF”

A description of what the bug is.
出现问题:数字被拆分,EDF被拆分,拆分成ce,shi

Steps to reproduce

索引设置:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin_tokenizer"
}
},
"tokenizer": {
"my_pinyin_tokenizer": {
"type": "pinyin",
"keep_first_letter": false,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"limit_first_letter_length": 16,
"lowercase": true,
"none_chinese_pinyin_tokenize": true
}
}
}
}
}

分词测试:
GET /my_index/_analyze
{
"analyzer": "pinyin_analyzer",
"text": "理财123EDF"
}

Environment

  • Versions: [e.g. Elasticsearch 7.16.2]
  • analysis-pinyin 7.16.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions