Skip to content

windows text file corruption #470

@mcandre

Description

@mcandre

I was trying to read through an msiexec text log today to understand a broken MSI installer.

However, when I open this UTF-16 LE file in VSCode, then a few seconds later, it scrambles the contents. I get all these NULL markers scattered between each individual character. The encoding ID changes from UTF-16 LE to UTF-8 in the status bar.

I have my .editorconfig generally setup to prefer UTF-8 for the majority of files, including generic *.txt files.

If I recall correctly, UTF-16 LE is quite often just UTF-8 with extra null bytes, for content that happens to be in upure ASCII space anyway.

Something is causing the encoding label to change, without actually reencoding the editor content.

This happens in WSL. Perhaps that explains the few second delay.

In any case, please don't alter the label without reencoding the editor buffer. Better yet, don't alter the encoding label. Just show what VSCode detects from disk.

And don't try to load a file using encodings from .editorconfig. Just let VSCode load the file.

I understand why EditorConfig might adjust indentation, line termination, final EOL.

I understand why EditorConfig might adjust the encoding applied to a file on save, based on its file path (file extension).

But in the event of an encoding discrepancy, that would be more safely handled as emitting some kind of warning, rather than overriding anything.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions