Skip to content

Fix UTF-8 header corruption when fallback charset differs#10078

Open
OctopusET wants to merge 1 commit intoroundcube:masterfrom
OctopusET:fix-utf8-header
Open

Fix UTF-8 header corruption when fallback charset differs#10078
OctopusET wants to merge 1 commit intoroundcube:masterfrom
OctopusET:fix-utf8-header

Conversation

@OctopusET
Copy link

Problem

Korean email subjects display as garbled text (í¬ìФí¬ì˜¬ instead of 포스포올) when:

  • Email client (e.g., Outlook) sends raw UTF-8 headers without MIME encoding
  • Message body declares different charset (e.g., charset=ISO-8859-1)

Root Cause

In rcube_mime::decode_mime_string(), non-MIME headers use the body's charset as fallback. UTF-8 bytes get misinterpreted as Latin-1 and double-encoded.

Solution

Add UTF-8 detection before fallback conversion:

if (mb_check_encoding($input, 'UTF-8') && preg_match('/[\x80-\xFF]/', $input)) {
    return $input;
}

This validates UTF-8 byte sequences before applying potentially wrong charset conversion. Random Latin-1 bytes have only ~1/15 chance of forming valid UTF-8 sequences.

Standards

RFC 6532 (2012) legitimizes raw UTF-8 in email headers via SMTPUTF8 extension.

References

Screenshots

Before

image

After

image

Add UTF-8 detection before charset conversion to prevent
double-encoding of raw UTF-8 headers (e.g., Korean from Outlook).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant