Skip to content

[API Proposal]: Allow custom JavaScriptEncoder that allows HTML-sensitive characters #70419

Open
@habbes

Description

@habbes

Background and motivation

The OData team is adopting Utf8JsonWriter to improve its JSON serialization performance. Currently it uses a custom-built JsonWriter. To minimize breaking changes and friction to our users, we would like the new writer to be compatible with the existing output as far as the serialized output is concerned. One incompatibility that has emerged is how the two writers handle string escaping:

The OData writer by default escapes control chars (< 0x20), non-ASCII chars (> 0x7F) and characters like ", \, \n, \b, \f, \r, \t.

None of the built-in JavaScriptEncoder implementing matching escaping rules.

The JavaScriptEncoder.Default escapes all the characters the OData writer escapes, but it also escapes HTML-sensitive characters like < and > which OData does not. It also escapes double quote using \u0022 where OData escapes it using a backslash: \".

The JavaScriptEncoder.UnsafeRelaxedJsonEscaping does not escape HTML-sensitive characters, but it also does not escape non-ASCII characters (> 0x7f).

I tried to create a custom TextEncoderSettings object to explicitly allow the characters that I do not want to be escaped. I explicitly allowed characters like < and passed it to JavaScriptEncoder.Create(settings). But the HTML characters were still escaped. It seems like JavaScriptEncoder.Create is hardwired to forbid HTML characters even when the user explicitly allows them.

JavaScriptEncoder.Create calls the constructor of the internal DefaultJavaScriptEncoder(TextEncoderSettings settings, bool allowMinimalJsonEscaping). This creates an OptimizedInboxTextEncoder with the option to forbid HTML characters depending on whether allowMinimalJsonEscaping is set to true or false.
This allowMinimalJsonEscaping is set to false when creating an encoder with custom settings. And there does not seem to be any option for the user to enable it.

It would be great if the user had the option to set allowMinimalJsonEscaping to true when calling JavaScriptEncoder.Create, or any alternative that allows the bypassing the HTML escaping.

API Proposal

namespace System.Text.Encodings.Web

public abstract class JavaScriptEncoder : TextEncoder
{
    public static JavaScriptEncoder Create(TextEncoderSettings settings, bool allowMinimalJsonEscaping);
}

API Usage

// Fancy the value
var settings = new TextEncoderSettings();
settings.AllowRange(UnicodeRanges.BasicLatin);
settings.AllowCharacter(...);

var encoder = JavaScriptEncoder.Create(settings, true);

var writer = new Utf8JsonWriter(output, new JsonWriterSettings { Encoder = encoder });

writer.WriterStringValue("A is < B"); // writes "A is < B" instead of "A is \u003C B"

Alternative Designs

Alternatively, you can change the behaviour of JavaScriptEncoder.Create(TextEncoderSettings) such that it does not forbid HTML-sensitive characters. But this would be a breaking change.

Risks

Allow HTML-sensitive characters presents the same risks as using the existing JavaScriptEncoder.UnsafeRelaxedJsonEscaping. Those risks are outlined in these docs. Our use-case is sending a JSON response when application/json; charset = utf-8 header is set.

AspNetCore also uses JavaScriptEncoder.UnsafeRelaxedJsonEscaping for JSON serialization by default.

The new API would essentially be JavaScriptEncoder.UnsafeRelaxedJsonEscaping with a bit more control to escape additional characters.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions