-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(std.zon): add escape_unicode options to zon.serializer #23596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(std.zon): add escape_unicode options to zon.serializer #23596
Conversation
Currently std.zon.stringify.serialize will always produce unicode to be escaped, whereas in std.json.stringify by default doesn't escape unicode. Adding escape_unicode option matching with the json serializer but by default it is false (as the current behaviour) to keep things backward compatible. ```zig const std = @import("std"); test "std.zon.stringify.serialize escape_unicode = false" { var buf = std.ArrayList(u8).init(std.testing.allocator); defer buf.deinit(); try std.zon.stringify.serialize( .{ .char = 'অ' }, .{ .escape_unicode = false }, buf.writer(), ); try std.testing.expectEqualStrings(".{ .char = \"অ\" }", buf.items); buf.clearRetainingCapacity(); } ```
Thanks for the PR! I'll take a look at this and the other Unicode related issue today. In particular, I want to look into whether or not it's necessary to maintain backwards compatibility with the current behavior. [EDIT] Sorry for the delay, haven't forgotten about this though will get to it soon! |
Yeah, I feel like it doesn't need to be backward compatible and should by default not escape unicode since this is usual behavior in most serializer and zon.serializer has not been adopted that much yet. |
Apologies for the delay on this! Looking it over, there was no good reason for me to escape everything by default. Adding However there's one important case that needs to be addressed before this can be merged. Unless I'm missing something, the implementation here now doesn't escape You can see how |
Linking the issue you filed #23535 here since it's related to this PR in that it's an example of a character that can't really be printed the way you'd expect right now. We probably want to figure out how to address this as well. |
…de when needed Previousely `⚡` -> `'\xe2\x9a\xa1'` (Notice the hex code is single quoted which is not valid Zig/ZON syntax) Now `⚡` -> `"\xe2\x9a\xa1"` `127` -> `'\x7f'` (Will still emit single quoted hex when possible)
Updated the code to escape items that needs to be escaped following what stringEscape does. I'm wondering though if it is okay to have this almost duplicated string escape logic here or just update |
Currently std.zon.stringify.serialize always escapes Unicode characters, while std.json.stringify by default does not. This change adds an escape_unicode option that matches the JSON serializer's behavior. To maintain backward compatibility, the default value is true, preserving the current behavior of escaping Unicode.
Change
Before
std.zon.stringify.serialize(buff, .{ .whitespace = true }, writer)
Output:
After
std.zon.stringify.serialize(buff, .{ .escape_unicode = false, .whitespace = true }, writer)
Output:
Test
Use Case
I was trying to store Unicode data in a ZON file, which I previously did in JSON. When converting from JSON to ZON using the JSON parser and ZON serializer, the Unicode characters were always escaped. This made the ZON file hard to read, which defeats its purpose as a human-readable format.