Skip to content

Conversation

ed-henrique
Copy link

Fixes #14551.

Deduplicates definitions' locations before rendering. When a collision happens for the locations using their URIs and ranges as keys, the one with the highest offset encoding is used (chosen because OffsetEncoding::Utf16 is the highest in the enum, and also the default)

@ed-henrique
Copy link
Author

I used an auxiliar order Vec to keep the original locations' order after keeping them deduplicated through the HashMap.

Comment on lines 95 to 97
if location.offset_encoding > existing.offset_encoding {
*existing = location;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handling of offset encoding here is not correct. Offset encoding controls how the character offset in a lsp::Position (within lsp::Range, within Location) should be interpreted: either as a byte offset (UTF-8), UTF-16 code unit offset or character offset (UTF-32). We can't know how a Location using one offset encoding compares to a Location using another offset encoding until we read the file's contents.

Instead let's not attempt to deduplicate locations using different offset encodings, i.e. only deduplicate when the full Location is equal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we depuplicate only when the full Location is equal, this does not solve the problem in #14551, since the OffsetEncoding is the only difference between the Locations given by the different LSPs. If I remove the curent behavior, the user won't feel any difference between before and after the fix, because they will still be duplicated in the picker.

Also, when actually following the definitions, I couldn't find a difference between cursor or buffer position in either encoding while testing.

How do you think we should handle that? Are there two known LSPs that provide different results when following Locations where the OffsetEncoding is different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, considering this, I used the highest OffsetEncoding as the default, but we could just as well use the first one that appears, if no difference is actually detectable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't notice the difference on ASCII text. For an example, if a line contains a character like 🏴 (U+1F3F4) then the byte offset after the '🏴' would be 4 for UTF-8, 2 for UTF-16 and 1 for UTF-32. So if the contents of a line are not all ASCII then you can't know that two lsp::Ranges with different offset encodings are equal.

Copy link
Author

@ed-henrique ed-henrique Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, the duplicates are expected behavior, then? If the whole Location should be taken into account, I can use a IndexSet<Location> and simplify the function even further.

Is this PR still useful or should I original issue be closed concluding that it's expected behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Goto Definition shows the same file twice when I have two LSP setup

4 participants