Skip to content

Add a method to access the raw version of attachments#125

Open
cosarara wants to merge 2 commits intostalwartlabs:mainfrom
cosarara:raw-body
Open

Add a method to access the raw version of attachments#125
cosarara wants to merge 2 commits intostalwartlabs:mainfrom
cosarara:raw-body

Conversation

@cosarara
Copy link

Our system sometimes receives emails where the attachments have the wrong mime type. For instance, a PNG attachment might come in with the content-type set to text/plain.
mail-parser helpfully decodes text/plain into valid utf8, which means that non-utf8 byte sequences get replaced by “�” (U+FFFD), destroying the content. Then, there is no way to recover the original contents of the file if later down the line we decide we would like to reinterpret the attachment as non-text.
I have patched the library to add a raw_body next to the body for attachments, which always returns the bytes regardless of PartType, and we are using this as an internal fork for now, but it would be a lot nicer if we can get the feature (in this form or some other way) upstreamed and move back to upstream stalwartlabs/mail-parser.

@sftse
Copy link
Contributor

sftse commented Nov 26, 2025

Can an example be provided what such an email looks like? Might be able to correct this, independent of exposing raw access.

Not the maintainer, but it could be a more aligned fix to wrap the parts and expose methods to decode to &str or raw access:

struct Text<'b> {
    charset: (),
    raw: &'b [u8]
}

struct Html<'b> {
    charset: (),
    raw: &'b [u8]
}

impl Text<'_> {
    fn raw(&self) -> &[u8] {
        self.raw
    }
    fn decode_to_utf8(&self) -> Cow<'_, str> {
        todo!()
    }
}

impl Html<'_> {
    fn raw_html(&self) -> &[u8] {
        self.raw
    }
    // note that charset in <meta> tags can become out-of-sync with real utf8 encoding
    // if consumer is a browser, likely want fn raw_html
    fn decode_to_utf8(&self) -> Cow<'_, str> {
        todo!()
    }
}

This would be a less intrusive change for the tests, instead of duplicating both raw and decoded versions, there's a clone less in the good case and would be a clean fix for #109 where we might want raw html access.

@cosarara
Copy link
Author

Attached a crafted example file where a PNG's header says text/plain:
png_as_text_plain.eml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants