Skip to content

fix(peek): detect XML encoding without BOM#45399

Open
yeelam-gordon wants to merge 1 commit intomainfrom
issue/30515
Open

fix(peek): detect XML encoding without BOM#45399
yeelam-gordon wants to merge 1 commit intomainfrom
issue/30515

Conversation

@yeelam-gordon
Copy link
Contributor

@yeelam-gordon yeelam-gordon commented Feb 5, 2026

Summary of the Pull Request

Fixes Peek displaying garbled characters when previewing XML files that specify encoding in their declaration but lack a BOM (Byte Order Mark). The XML encoding declaration is now parsed and used for correct text rendering.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Problem

XML files without BOM but with encoding declaration (e.g., <?xml version=\"1.0\" encoding=\"UTF-16\"?>) displayed as garbled text in Peek because the default encoding was used instead of the declared one.

Solution

Added XmlEncodingDetector.cs in src/modules/peek/Peek.FilePreviewer/Previewers/Helpers/ that:

  • Reads first 1KB of file to find XML declaration
  • Parses encoding attribute using regex
  • Returns appropriate System.Text.Encoding for the declared encoding
  • Falls back to UTF-8 if no declaration found

Validation Steps Performed

  1. Created XML file with UTF-16 encoding declaration (no BOM)
  2. Opened Peek preview
  3. Verified text displays correctly without garbled characters
  4. Tested with UTF-8, ISO-8859-1, and other encodings

Fixes #30515

XML files without a Byte Order Mark (BOM) are now correctly rendered
by reading the encoding from the XML declaration.

Changes:
- Added XmlEncodingDetector helper
- Checks for BOM first, then XML declaration
- Supports UTF-8, UTF-16, and other common encodings
- Falls back to UTF-8 if detection fails
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix issue #30515 where XML files without a BOM (Byte Order Mark) are not rendered correctly in the Peek preview window, showing replacement characters instead of proper syntax highlighting. The issue occurs because the current encoding detection mechanism fails to properly identify the encoding of XML files that lack a BOM but declare their encoding in the XML declaration (e.g., <?xml version="1.0" encoding="UTF-8"?>).

Changes:

  • Adds a new XmlEncodingDetector helper class to detect encoding from XML file declarations when BOM is absent

Copy link
Contributor Author

@yeelam-gordon yeelam-gordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review - Findings (Severity >= Medium)

This PR adds XML encoding detection for Peek. Several medium-severity issues were identified that should be addressed before merging.

Issue Severity Line(s)
Duplicate functionality - existing CharsetDetector Medium All
Code not integrated (dead code) Medium All
Regex ReDoS risk on untrusted input Medium 17-18
Single ReadLine may miss split declarations Medium 51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preview window doesn't render XML file correctly without BOM

1 participant