Skip to content

[API Proposal]: Consider adding a format-agnostic serializer. #83652

Open
@Foxtrek64

Description

@Foxtrek64

Background and motivation

The System.Text.Parsing namespace will hold language-agnostic interfaces and classes used for parsing text-based languages such as Json, Xml, Yaml, and others. This could potentially be used by Roslyn as well, should they find anything useful, but that's not a priority of this library.

Additionally, this library will not be functional on its own. It should merely provide the framework on which to build language-specific parsing libraries, for example System.Text.Json or a future System.Text.Xml. Changes to System.Text.Json to use this library will be a project in itself and thus will be placed in its own proposal.

The S.T.Parsing library should begin its life by abstracting components of System.Text.Json into agnostic forms, wherever possible. This will give us a good head start, but additional features can be added to this library as we find a need for them, even if such features are not applicable to every possible language.

API Proposal

First, a few more generic things that won't follow the standard suggestion format.

  1. Migrate PooledByteBufferWriter.cs to System.Text.Parsing.PooledByteBufferWriter. Probably change visibility to protected, but I'm not too familiar with where this is used.
  2. Migrate HexConverter.cs to System.Text.Parsing.HexConverter. Probably change visibility to public, else protected.
  3. Migrate JsonNamingPolicy.cs to System.Text.Parsing.NamingPolicy. JsonNamingProperty should become a static class behaving like a type-safe enum, and specific policies like SnakeCaseNamingPolicy should instead implement NamingPolicy.

We should consider creating a bunch of generic Naming Policies. For instance, a SnakeCaseNamingPolicy could be applied to json as "some_key": 42 and xml as <some_key>42</some_key>, so there is reuse to be had here.

Create System.Text.Parsing.Document.ITextDocument (or IDocument, whichever sounds better):

// I have only included public interfaces from JsonDocument here, but it may be prudent to expose some internal methods as well.
public interface ITextDocument<TRoot, TextWriter> : IDisposable
    where TRoot : IElement
    where TextWriter : ITextWriter
{
    void WriteTo(TextWriter writer);
}

Create System.Text.Parsing.Document.IElement

// Should these returns be generic somehow?
public interface IElement<TextWriter>
    where TextWriter : ITextWriter
{
    // Potentially some sort of generic ValueKind.
    
    // All elements should at least be readable. Consumer can make this writable if they choose.
    IElement this[int index] { get; }

    int GetArrayLength();

    IElement GetProperty(string propertyName);
    IElement GetProperty(ReadOnlySpan<char> propertyName);
    IElement GetProperty(ReadOnlySpan<byte> utf8PropertyName);

    bool TryGetProperty<TElement>(string propertyName, out TElement value) where TElement : IElement;
    bool TryGetProperty<TElement>(ReadOnlySpan<char> propertyName, out TElement value) where TElement : IElement;
    bool TryGetProperty<TElement>(ReadOnlySpan<byte> utf8PropertyName, out TElement value) where TElement : IElement;

    // Currently, there are separate GetInt(), GetByte(), etc. for the JsonElement type.
    X GetX();
    bool TryGetX(out x);

    // I propose we skip those and simply use generics. If the consumer wishes to separate these out, e.g. for backwards compatibility, that's fine, but with the new `INumber` interface to handle numeric types we shouldn't face any roadblocks with a generic implementation. And if something comes up where that is the case, the consumer can always explicitly implement these and have them throw, then implement separate GetX and TryGetX methods.
    TValue GetValue<TValue>();
    bool TryGetValue<TValue>(out TValue);

    string GetRawText();

    // Provided for backwards-compatibility
    bool ValueEquals(string? text);
    bool ValueEquals(ReadOnlySpan<byte> utf8Text);

    // Perform type comparison using default equality comparer.
    bool ValueEquals<TValue>(TValue other);

    void WriteTo(TextWriter writer);

    public ArrayEnumerator EnumerateArray();

    public ObjectEnumerator EnumerateObject();

    public override string ToString();

    public IElement Clone();
}

Create System.Text.Parsing.Document.JsonProperty

public interface IProperty<TextWriter>
    where TextWriter : ITextWriter
{
    string Name { get; }

    bool NameEquals(string? text);
    bool NameEquals(ReadOnlySpan<byte> utf8Text);
    bool NameEquals(ReadOnlySpan<char> text);

    bool WriteTo(TextWriter writer);

    string ToString();
}

Create System.Text.Parsing.Reader.IReader
Might also be a good candidate for an abstract class to implement a lot of the internal and private functionality.

public interface IReader
{
    ReadOnlySpan<byte> ValueSpan { get; private set; }

    long BytesConsumed { get; }

    long TokenStartIndex { get; private set; }

    int CurrentDepth { get; }

    // Some sort of generic TokenType

    bool HasValueSequence { get; private set; }
    bool ValueIsEscaped { get; private set; }
    bool IsFinalBlock { get; }
    
    ReadOnlySequence<byte> ValueSequence { get; private set; }

    SequencePosition Position { get; }

    ReaderState CurrentState { get; }

    bool Read();

    void Skip();
    bool TrySkip();

    bool ValueTextEquals(ReadOnlySpan<byte> utf8Text);
    bool ValueTextEquals(string? text);
    bool ValueTextEquals(ReadOnlySpan<char> text);
}

Create System.Text.Parsing.Writer.IWriter
Might also be a good candidate for an abstract base class.
I could go on forever I'm sure but this will get pretty repetitive. Discussion should be had for a list of candidates for migration/generalization.

API Usage

This library contains base classes and interfaces and should only be used by those wanting to implement their own parsing/serialization library. No samples are provided here because it's just standard implementation.

Alternative Designs

No response

Risks

This proposal by itself should add no inherent risks as it is non-breaking. We are simply creating a new library and new types under its library.

Metadata

Metadata

Assignees

Labels

api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.Text.Jsondesign-discussionOngoing discussion about design without consensus

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions