Utf8.TryWrite applies alignment by counting bytes instead of characters

### Description

When using `Utf8.TryWrite` to write an interpolated string as UTF8 bytes, passing an alignment value with any of the formatted values does not always result in the same amount of padding as when using `string.Format` or default string interpolation. If the formatted value has any non-ASCII characters then then less padding will be added.

### Reproduction Steps

```cs
using System;
using System.Text;
using System.Text.Unicode;

string[] examples = new[]
{
    "\u0108",       // Ĉ 1 char, 2 bytes UTF8
    "\u20ac",       // € 1 char, 3 bytes UTF8
    "\ud83d\ude00", // 😀 2 chars, 4 bytes UTF8
};

foreach (string s in examples)
{
    Console.WriteLine($"utf16: [{s,4}]");
}
foreach (string s in examples)
{
    Span<byte> span = new byte[8];
    Utf8.TryWrite(span, $"[{s,4}]", out int written);
    Console.WriteLine("utf8:  " + Encoding.UTF8.GetString(span.Slice(0, written)));
}
```

### Expected behavior

Formatting a value with an alignment in `Utf8.TryWrite` should produce the same amount of padding in UTF8 as is added in other .NET string (UTF16) formatted strings. 

For the code snippet above, it should produce:
```
utf16: [   Ĉ]
utf16: [   €]
utf16: [  😀]
utf8:  [   Ĉ]
utf8:  [   €]
utf8:  [  😀]
```

### Actual behavior

When the formatted value includes any characters that require more than 1 byte in UTF8 encoding, the alignment is incorrect and produces less padding in `Utf8.TryWrite`. 

For the code snippet above, it produces:
```
utf16: [   Ĉ]
utf16: [   €]
utf16: [  😀]
utf8:  [  Ĉ]
utf8:  [ €]
utf8:  [😀]
```

### Regression?

This has been the behavior since the `Utf8.TryWrite` API was introduced in .NET 8, and it is also reproducible in .NET 9.

### Known Workarounds

If the correct padding is really needed, default string interpolation or formatting can be used to format the value as UTF16 in a string or a `Span<char>`, and then that UTF16 can be encoded into the UTF8 output `Span<byte>` using `Encoding.UTF8.GetBytes`. 

This loses the nice ergonomics of formatting directly into the UTF8 buffer, and either allocates (if making a string) or requires more buffer management to get a `Span<char>`.

### Configuration

Tested in .NET 8 & .NET 9 Preview
I'm using Windows x64, but I'm pretty sure this is not platform/arch dependent.

### Other information

Before I begin -- I am interested in trying to fix this and I'm happy to open a PR for it. It would be my first time contributing however, so I understand if you feel someone else should handle fixing it instead.

The issue is that the amount of required padding is being determined by counting how many _bytes_ were written, even though we're working with UTF8 where many characters take more than one byte. Here's the culprit:
https://github.com/dotnet/runtime/blob/e133fe4f5311c0397f8cc153bada693c48eb7a9f/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8.cs#L778

The simple (and maybe too naiive) approach to fix this would be to use `Encoding.UTF8.GetCharCount` on the slice that was written, to measure how many `char`s the formatted text ended up writing. But this private method is called by multiple overloads of `AppendFormatted` and for some of them, we alread know how many `char`s we wrote. For example, if the value being formatted is a `ReadOnlySpan<char>` or `string`, we know how many `char`s it had. Or if it was `ISpanFormattable`, we already formatted it into our own `Span<char>` buffer before writing and know how many `char`s there are.

So I think a better solution might be to find a way to have the overloads pass an optional `int charsWritten` if they know how many there were. If not, the alignment handling should call `Encoding.UTF8.GetCharCount` on the bytes we wrote so far to calculate how many `char`s it ended up being.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utf8.TryWrite applies alignment by counting bytes instead of characters #109615

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Utf8.TryWrite applies alignment by counting bytes instead of characters #109615

Description

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions