The behavior change of tokenizer between ES 6 and ES 8

### Elasticsearch Version

v8.16.4

### Installed Plugins

nori analyer plugin - analysis-nori-8.16.4.jar  lucene-analysis-nori-9.12.0.jar 

### Java Version

openjdk version "17.0.15" 2

### OS Version

ubuntu 24

### Problem Description

Hello Maintainers

Recently, we upgraded ElasticSearch from v6.8 to v8.16 and installed Nori analyzer plugin corresponds with v8.16 ElasticSearch.

And, through some tests, we noticed that the tokenization behavior was different from each Elastic Search version.

For example, in the case of "A7B" word, which is analyzer of Elastic Search v6.8, the token is tokenized as A7B.
However, using ElasticSearch v8.16 tokenizes this word to A, 7, B.

In my point of our test case, this can be an issue.

So, I would like to ask you some questions below.

1) Reason or background of this change.
    What are the benefits of this change?

2) Is there a way to configure a tokenization method that is the same or similar to the ElasticSearch V6.8?


current ES 8> installed plugin
analysis-nori-8.16.4.jar  lucene-analysis-nori-9.12.0.jar 

previous ES 6> installed plugin
analysis-nori-6.8.2.jar  lucene-analyzers-nori-7.7.0.jar

### Steps to Reproduce

1) installing nori analyzer plugin from Elastic Search 6 and Elastic Search 8

2) indexing document which contains "A7B" word from ElasticSearch 6, and Elastic Search 8


### Logs (if relevant)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The behavior change of tokenizer between ES 6 and ES 8 #128008

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The behavior change of tokenizer between ES 6 and ES 8 #128008

Description

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions