Skip to content

[Detail Bug] LSP diagnostic end position uses byte length instead of UTF-16 code units #292

@detail-app

Description

@detail-app

Detail Bug Report

https://app.detail.dev/org_4c5b920c-9e04-4530-a4d2-953927859978/bugs/bug_6721c4a4-fb7e-4135-99ae-16b7c6d61794

Summary

  • Context: The check_text_inner function generates LSP diagnostic messages for detected typos, including the text range (start and end positions) where each typo appears.
  • Bug: The end position of diagnostic ranges is calculated using byte length instead of UTF-16 code unit count.
  • Actual vs. expected: When a typo contains multi-byte UTF-8 characters (e.g., "café"), the end position is calculated as start_position + byte_length, but LSP requires UTF-16 code units, so it should be start_position + utf16_length.
  • Impact: Diagnostic ranges are incorrect for typos containing non-ASCII characters, causing misaligned highlighting and incorrect code action ranges in editors.

Code with bug

// crates/typos-lsp/src/lsp.rs:449-455
crate::typos::check_str(buffer, tokenizer, dict, ignore)
    .map(|(typo, line_num, line_pos)| {
        Diagnostic {
            range: Range::new(
                Position::new(line_num as u32, line_pos as u32),
                Position::new(line_num as u32, (line_pos + typo.typo.len()) as u32), // <-- BUG 🔴 using byte length instead of UTF-16 length
            ),

Logical proof

  • LSP positions must be in UTF-16 code units, and this server advertises UTF16 position encoding.
  • line_pos is computed as a UTF-16 code unit count:
    let line_pos = before_typo.chars().map(char::len_utf16).sum();
  • typo.typo is a str/Cow<str>; str::len() returns byte length, not UTF-16 units.
  • The code adds UTF-16 line_pos to UTF-8 byte length (typo.typo.len()), mixing units, which is incorrect whenever the typo includes non-ASCII characters.
  • Example: "café" has byte length 5 but UTF-16 length 4. If the typo starts at UTF-16 position 10, the buggy end is 15 instead of 14, leading to a one-unit overshoot in the diagnostic range.

Recommended fix

Replace the byte-length addition with a UTF-16 code unit count:

Position::new(
    line_num as u32,
    (line_pos + typo.typo.chars().map(|c| c.len_utf16()).sum::<usize>()) as u32  // <-- FIX 🟢 count UTF-16 code units
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions