Skip to content

Clarify UTF encoding between C# strings and Godot Strings #10920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Athenr
Copy link

@Athenr Athenr commented May 2, 2025

Fix #7682

Updated c_sharp_differences.rst to include the difference between the UTF encoding for C# strings and Godot Strings.

Athenr added 2 commits May 2, 2025 16:14
…coding

Clarified that C# System.String uses UTF-16 encoding while Godot String uses UTF-32.
…dot-String-UTF-encoding

Update c_sharp_differences.rst with C# string and Godot String UTF encoding
@skyace65 skyace65 added enhancement topic:dotnet area:manual Issues and PRs related to the Manual/Tutorials section of the documentation labels May 3, 2025
@skyace65 skyace65 requested a review from a team May 3, 2025 01:13
Revising for grammatical fixes in changes.

Co-authored-by: A Thousand Ships <[email protected]>
Copy link
Member

@raulsntos raulsntos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing to the C# documentation. I think we had in mind something more extensive that clarifies why this can be a problem.

Normally, the encoding is not a problem since we convert between C# strings and Godot strings automatically, so ideally users wouldn't even need to think about it. So mentioning the encoding difference would not be important if that's all there is to say.

We wanted to add a note about this in the documentation because of the problems that it may cause in some APIs. The example given in the discussion from #7612 was TextServer::string_get_word_breaks. This API breaks the text into words and returns an array of character indices, but these indices will be wrong for C# strings in some cases.

For example, for the string "ℌ𝔢𝔩𝔩𝔬 𝔚𝔬𝔯𝔩𝔡" the returned array would be [0, 5, 6, 11]. But those indices don't correspond in the C# string because the characters may take more than a single UTF-16 character. In C# the indices should be [0, 9, 10, 20] or you should use System.Rune instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:manual Issues and PRs related to the Manual/Tutorials section of the documentation cherrypick:4.4 enhancement topic:dotnet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document that C# strings are UTF16 while Godot Strings are UTF32
5 participants