Skip to content

Email addresses in HTML content are removed when sanitizing text coming from a plaintext email #126

Closed
@istrasoft

Description

@istrasoft

When the string to sanitize comes from a plaintext email, such items are present in the original content :

blah blah

From: Mark <mailto:[email protected]>
Sent: Wednesday, August 16, 2017 19:47
To: John <[email protected]>
Subject: Re: Document Test

Hello John

If the email was a HTML email, the < and > around "<[email protected]>" are aleady escaped as < and > but if the email was plaintext, they are not.

In this specific case, the part <[email protected]> is considered to be an invalid HTML tag and is removed, along with all the following content from that point.

If option "Keep child nodes of removed elements" is chosen, then only these email tags are lost.

It would be great if after testing a tag against the whitelist, an additional test was made to attempt to match it to these two authorized and standard and safe instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions