Prototype: unicode string support#1517
Prototype: unicode string support#1517cvrunmin wants to merge 12 commits intocc-tweaked:mc-1.19.xfrom
Conversation
|
tbh i do not believe that this has any chance of being merged into CC:T, best of luck but i feel like a lot of features like this have been rejected before because it would conflict with the mods "feel" |
|
I really hope that this can find some kind of compromise, because having Unicode support for a terminal kinda is something you'd expect. And it would make stuff so much easier. |
|
Personally fully in support of such addition. This is obviously an enromous change, but I believe the current status quo of only having latin + a few extra chars creates a pointless barrier of entry for people from different cultures and is not the way to go forward, even if this takes a while to polish out. I think the potential to slightly "break the feel", as dev1955 pointed out, is worth it to allow people unfamiliar with english to use the mod. |
I dont think the rom even supports multiple languages so it sounds kinda pointless in making it more accessible for newbies Im not against this change as i use older versions anyway but this still feels too drastic |
I guess I worded this a bit poorly, I meant not that newbies would be able to set ROM to their language or code in it, but that non-technical users could potentially interact with CC programs others made in their native tongue e.g. a shop or a dashboard |
|
O_o That's a lot of new functions to add and import... |
|
Thank you for looking into this. I realise this is a bit of a pain, but I think it probably makes sense to do this work in two stages/two PRs:
|
|
Quick response of point 1:
As for the cloning hell in point 2, it is because we cannot distinguish when we expect a latin1 string and when we expect a utf8 string. I believe using that could ruin the subsequent calls if someone forgets to restore state. For duplicating event issue, would it be better if we send two params for |
FWIW, my approach for events in my test was to use the same event name, but adding a second parameter with the I didn't do a good job at describing every change I made in my test back then, but you could take a look at the ROM patches for inspiration. I still think this is the most elegant solution, even if it ends up adding a bunch of extra Unicode options to functions. Also, I don't really like the idea of making people interact with the UTF-8 representations of strings directly. It's really easy to slip up and end up putting it into a normal string function, which would destroy the codepoints and/or not function correctly. It's good that there's a usermode library to help, but IMO we shouldn't be exposing the raw encoding data to users unless they specifically ask for it (e.g. an |
merge separated unicode modules/functions back to their normal variant.
|
How's this going ? It would be an actually great feature. I'm following the progress. |
This aims to address the issue #860 about reading and writing unicode character into terminals.
This pull request mostly adapt the first route in the discussion ("separate versions of methods for unicode").(Edit: no longer valid since the commit at 12 July.)Additions
utflibapiThis api provides a
UTFString"class" that wraps a utf8-encoded byte string and act as a normal string. Functions that are provided in the standardstringlibrary, exceptstring.dump,string.pack,string.packsize,string.unpack, are also provided inUTFString. Users can use this to adapt unicode strings into their old system painlessly. If users want to get Latin-1 string from UTFString, they can useUTFString:toLatin(). Otherwisetostringwill return the backend byte string.Besides
UTFString, the module also exports the following functions:fromLatin(str): consider the string as fully Latin-1 and convert it into utf8. Such function is provided asUTFString(str)will consider the string as already utf8-encoded, and only consider invalid byte subsequences as Latin1-encoded and convert them.isUTFString(v): return true ifvis a UTFString.wrapStr(str): wrap a lua string so that normal string can be compared with unicode string.isStringWrapper(v): return true ifvis a string wrapper fromwrapStr(str)shell.unicode,edit.unicode,lua.unicodesettingsNew settings allows
shell,editandluaprograms to receive and print unicode strings. Such settings will not affect other programs, especially user-defined programs.Changes
Now they accepts UTFString properly, and do not write "table: 0x??????" on screen.TermMethods.writeandTermMethods.blitfunctionseditprogram when unicode text presents.
Now it also send utf8-encoded string as the second parametercharandpasteevents
Now it acceptsreadfunction_bReadUnicodeas the 5th argument, indicating whether it should take UTFString whentrue, or a normal string otherwise.Roadmap
Edit
2023-07-12: merge separated unicode modules/functions back to their normal variant to reduce code duplication hell.