Use byteidx()/utf16idx() for correct UTF-16 position conversion#1649
Open
mattn wants to merge 5 commits into
Open
Use byteidx()/utf16idx() for correct UTF-16 position conversion#1649mattn wants to merge 5 commits into
mattn wants to merge 5 commits into
Conversation
LSP uses UTF-16 code unit offsets for character positions, but the current implementation uses strcharpart()/strchars() which count Unicode codepoints. This is incorrect for characters outside the BMP (e.g. emoji) that require surrogate pairs in UTF-16. When byteidx() with utf16 flag and utf16idx() are available (Vim 9.0.1485+), use them for correct UTF-16 offset handling. Falls back to the existing codepoint-based conversion on older Vim and Neovim.
Avoid calling exists() on every lsp#utils#to_char() invocation.
Define separate s:to_col/s:to_char/lsp#utils#to_char functions at
script load time based on exists('*utf16idx'), eliminating per-call
branching overhead. Also extract common line-fetching logic into
s:_get_line() helper.
Collaborator
Author
Benchmark resultsShort strings are slightly faster, but long strings are slower due to The primary benefit of this PR is correctness: proper UTF-16 code unit handling for characters outside the BMP (emoji with surrogate pairs), which |
- Replace v:none (Vim-only) with empty list [] in s:_get_line() to fix E121 on Neovim v0.4/v0.5 - Handle utf16idx() when byte index falls in the middle of a multi-byte character by rounding up to the next character index - Handle byte index past end of string to avoid utf16idx() returning -1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
byteidx(str, idx, v:true)andutf16idx(str, byteidx)for LSP position conversion when available (Vim 9.0.1485+)strcharpart()/strchars()counts Unicode codepoints, which is incorrect for characters outside the BMP (e.g. emoji with surrogate pairs)Changed files
autoload/lsp/utils/position.vim:s:to_col()ands:to_char()now use UTF-16 aware builtinsautoload/lsp/utils.vim:lsp#utils#to_char()likewiseTest plan