Word splitting is not taking CJK into account

### Please complete the following tasks

- [x] I have searched the [open](https://github.com/crate-ci/typos/issues?q=is%3Aissue%20state%3Aopen%20label%3AA-dict) and [rejected](https://github.com/crate-ci/typos/issues?q=is%3Aissue%20state%3Aclosed%20label%3AA-dict) issues

### Valid word

HashiCorp

### Incorrect correction

HashCorp

### Justification

<https://www.hashicorp.com>

### Notes

When Japanese (CJK) characters are adjacent to ASCII characters without whitespace (e.g., 含むHashiCorp製品),
typos treats the entire mixed-script string as a single identifier rather than splitting at the script boundary.
As a result, extend-identifiers entries like HashiCorp = "HashiCorp" do not match because the actual identifier is the longer mixed-script string. 
The identifier is then split into subwords,
and Hashi is flagged as a typo for Hash,
which extend-words for HashiCorp also cannot suppress since it only matches at the subword level against the full string HashiCorp.

The expected behavior would be for typos to treat script boundaries (e.g., CJK to Latin) as identifier boundaries,
so that HashiCorp in 含むHashiCorp製品 is recognized as a standalone identifier and correctly matched against extend-identifiers entries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word splitting is not taking CJK into account #1506

Please complete the following tasks

Valid word

Incorrect correction

Justification

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Word splitting is not taking CJK into account #1506

Description

Please complete the following tasks

Valid word

Incorrect correction

Justification

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions