Description
Background and motivation
The OData team is adopting Utf8JsonWriter
to improve its JSON serialization performance. Currently it uses a custom-built JsonWriter
. To minimize breaking changes and friction to our users, we would like the new writer to be compatible with the existing output as far as the serialized output is concerned. One incompatibility that has emerged is how the two writers handle string escaping:
The OData writer by default escapes control chars (< 0x20
), non-ASCII chars (> 0x7F
) and characters like ", \, \n, \b, \f, \r, \t
.
None of the built-in JavaScriptEncoder
implementing matching escaping rules.
The JavaScriptEncoder.Default
escapes all the characters the OData writer escapes, but it also escapes HTML-sensitive characters like <
and >
which OData does not. It also escapes double quote using \u0022
where OData escapes it using a backslash: \"
.
The JavaScriptEncoder.UnsafeRelaxedJsonEscaping
does not escape HTML-sensitive characters, but it also does not escape non-ASCII characters (> 0x7f
).
I tried to create a custom TextEncoderSettings
object to explicitly allow the characters that I do not want to be escaped. I explicitly allowed characters like <
and passed it to JavaScriptEncoder.Create(settings)
. But the HTML characters were still escaped. It seems like JavaScriptEncoder.Create
is hardwired to forbid HTML characters even when the user explicitly allows them.
JavaScriptEncoder.Create
calls the constructor of the internal DefaultJavaScriptEncoder(TextEncoderSettings settings, bool allowMinimalJsonEscaping)
. This creates an OptimizedInboxTextEncoder
with the option to forbid HTML characters depending on whether allowMinimalJsonEscaping
is set to true
or false
.
This allowMinimalJsonEscaping
is set to false when creating an encoder with custom settings. And there does not seem to be any option for the user to enable it.
It would be great if the user had the option to set allowMinimalJsonEscaping
to true
when calling JavaScriptEncoder.Create
, or any alternative that allows the bypassing the HTML escaping.
API Proposal
namespace System.Text.Encodings.Web
public abstract class JavaScriptEncoder : TextEncoder
{
public static JavaScriptEncoder Create(TextEncoderSettings settings, bool allowMinimalJsonEscaping);
}
API Usage
// Fancy the value
var settings = new TextEncoderSettings();
settings.AllowRange(UnicodeRanges.BasicLatin);
settings.AllowCharacter(...);
var encoder = JavaScriptEncoder.Create(settings, true);
var writer = new Utf8JsonWriter(output, new JsonWriterSettings { Encoder = encoder });
writer.WriterStringValue("A is < B"); // writes "A is < B" instead of "A is \u003C B"
Alternative Designs
Alternatively, you can change the behaviour of JavaScriptEncoder.Create(TextEncoderSettings)
such that it does not forbid HTML-sensitive characters. But this would be a breaking change.
Risks
Allow HTML-sensitive characters presents the same risks as using the existing JavaScriptEncoder.UnsafeRelaxedJsonEscaping
. Those risks are outlined in these docs. Our use-case is sending a JSON response when application/json; charset = utf-8
header is set.
AspNetCore also uses JavaScriptEncoder.UnsafeRelaxedJsonEscaping
for JSON serialization by default.
The new API would essentially be JavaScriptEncoder.UnsafeRelaxedJsonEscaping
with a bit more control to escape additional characters.