Skip to content

fix: preserve BOM for UTF-8 encoded files#31

Open
michaelhthomas wants to merge 1 commit into
dprint:mainfrom
michaelhthomas:fix/bom
Open

fix: preserve BOM for UTF-8 encoded files#31
michaelhthomas wants to merge 1 commit into
dprint:mainfrom
michaelhthomas:fix/bom

Conversation

@michaelhthomas

Copy link
Copy Markdown

Issue

When formatting a file with a UTF-8 BOM, the plugin stripped it. This was causing a large number of diffs in my organization's repository, as it seems Visual Studio defaults to outputting files in UTF-8 with a BOM (for no apparent reason). This PR attempts to address this and preserve the same encoding as the input.

The root cause was in how SourceText was serialized back to bytes: SourceText.From(stream, encoding: null) correctly detected the BOM and set sourceText.Encoding to a BOM-emitting UTF8Encoding, but the formatters discarded that information by calling Encoding.GetBytes(string), which never writes a preamble regardless of the encoding's configuration (see the remarks section here for more information).

Fix

There are a few changes here, not all of which are strictly necessary:

  1. I decided to move where text encoding is performed, from the individual formatters and into CodeFormatters. So ICodeFormatter.FormatText now returns SourceText instead of byte[]. If there's any particular opposition to this, I'm happy to do it differently.

  2. Added a new extension SourceTextExtensions.GetBytes() which serializes a SourceText to bytes using StreamWriter(stream, sourceText.Encoding). Unlike Encoding.GetBytes(string), StreamWriter writes the encoding's preamble (the BOM) at the start of the stream when the encoding specifies one.

  3. CodeFormatters.FormatCode now calls result.GetBytes() on the returned SourceText to get the final byte output.

The result is that files with a UTF-8 BOM format have the BOM preserved, while files without one are unaffected.

@michaelhthomas

Copy link
Copy Markdown
Author

I'm also willing to make this behavior configurable, #22 seems to indicate that it might be desired behavior in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant