Repository: ltex-ls-plus
Version: 18.6.1
OS: Windows 10/11 (without "Beta: Use Unicode UTF-8 for worldwide language support" system setting — requires admin rights to enable)
Java: bundled JDK 21.0.8+9
Description
On Windows systems where the system codepage is not UTF-8 (Cp1252 / Windows-1252), ltex-ls-plus incorrectly splits words at Unicode character boundaries. Multi-byte UTF-8 characters such as ä, ö, ü, Ö, ß are not recognized as part of a word.
Example:
Österreich is treated as two tokens: Ö and sterreich
größte is treated as two tokens: gr and ßte
This means spell checking and grammar checking effectively does not work for any language using these characters (e.g. German, Austrian German).
Root Cause
The bundled JDK reports the following encoding settings on affected systems:
file.encoding = UTF-8 ✅ (correctly set)
native.encoding = Cp1252 ❌
stdout.encoding = Cp1252 ❌
stderr.encoding = Cp1252 ❌
sun.jnu.encoding = Cp1252 ❌
Setting JAVA_TOOL_OPTIONS to override encodings partially helps:
-Dfile.encoding=UTF-8 -Dstdout.encoding=UTF-8 -Dstderr.encoding=UTF-8 -Dnative.encoding=UTF-8
After applying these flags:
file.encoding = UTF-8 ✅
stdout.encoding = UTF-8 ✅
stderr.encoding = UTF-8 ✅
native.encoding = Cp1252 ❌ (cannot be overridden via JAVA_TOOL_OPTIONS)
sun.jnu.encoding = Cp1252 ❌ (cannot be overridden via JAVA_TOOL_OPTIONS)
native.encoding and sun.jnu.encoding remain Cp1252 and cannot be overridden without system-level changes that require administrator rights.
Steps to Reproduce
- Use Windows without the "Beta: Use Unicode UTF-8" system codepage setting (requires admin)
- Install ltex-ls-plus via Mason (Neovim)
- Open a
.tex or .md file containing German umlauts
- Observe that words containing
ä, ö, ü, ß are split at the umlaut
Expected Behavior
Words like Österreich, größte, Übung should be recognized as single tokens and spell/grammar checked correctly.
Workaround
None available without administrator rights. Users with admin rights can enable:
Control Panel → Region → Administrative → Change system locale → Beta: Use Unicode UTF-8
and restart — this sets the system codepage to UTF-8 and resolves the issue. (could not be tested, as I am working on a computer without admin rights.)
Suggested Fix
The ltex-ls-plus launcher script (.bat on Windows) could explicitly pass -Dsun.jnu.encoding=UTF-8 as a JVM argument directly in the launch command rather than relying on JAVA_TOOL_OPTIONS. This would bypass the system codepage restriction:
"%JAVA_EXEC%" -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 ... -jar ltex-ls.jar
This would fix the issue for all Windows users regardless of their system locale settings or admin rights.
Repository: ltex-ls-plus
Version: 18.6.1
OS: Windows 10/11 (without "Beta: Use Unicode UTF-8 for worldwide language support" system setting — requires admin rights to enable)
Java: bundled JDK 21.0.8+9
Description
On Windows systems where the system codepage is not UTF-8 (Cp1252 / Windows-1252), ltex-ls-plus incorrectly splits words at Unicode character boundaries. Multi-byte UTF-8 characters such as
ä,ö,ü,Ö,ßare not recognized as part of a word.Example:
Österreichis treated as two tokens:Öandsterreichgrößteis treated as two tokens:grandßteThis means spell checking and grammar checking effectively does not work for any language using these characters (e.g. German, Austrian German).
Root Cause
The bundled JDK reports the following encoding settings on affected systems:
Setting
JAVA_TOOL_OPTIONSto override encodings partially helps:After applying these flags:
native.encodingandsun.jnu.encodingremainCp1252and cannot be overridden without system-level changes that require administrator rights.Steps to Reproduce
.texor.mdfile containing German umlautsä,ö,ü,ßare split at the umlautExpected Behavior
Words like
Österreich,größte,Übungshould be recognized as single tokens and spell/grammar checked correctly.Workaround
None available without administrator rights. Users with admin rights can enable:
Control Panel → Region → Administrative → Change system locale → Beta: Use Unicode UTF-8and restart — this sets the system codepage to UTF-8 and resolves the issue. (could not be tested, as I am working on a computer without admin rights.)
Suggested Fix
The ltex-ls-plus launcher script (
.baton Windows) could explicitly pass-Dsun.jnu.encoding=UTF-8as a JVM argument directly in the launch command rather than relying onJAVA_TOOL_OPTIONS. This would bypass the system codepage restriction:"%JAVA_EXEC%" -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 ... -jar ltex-ls.jarThis would fix the issue for all Windows users regardless of their system locale settings or admin rights.