Skip to content

Commit 59442ba

Browse files
Merge branch 'release/3.15.0'
2 parents 9791b2d + 0abd896 commit 59442ba

File tree

10 files changed

+111
-59
lines changed

10 files changed

+111
-59
lines changed

.travis.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ language: python
55

66
python:
77
- "2.7"
8-
- "3.4"
9-
- "3.5"
10-
- "3.6"
8+
- "3.7"
9+
- "3.8"
10+
- "3.9"
1111

1212
before_install:
1313
- sudo apt-get -qq update

README.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@
88

99
# mail-parser
1010

11-
## Overview
12-
1311
mail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.
1412
It give you an easy way to pass from raw mail to Python object that you can use in your code.
1513
It's the key module of [SpamScope](https://github.com/SpamScope/spamscope).
@@ -28,15 +26,29 @@ $ apt-cache show libemail-outlook-message-perl
2826

2927
mail-parser supports Python 3.
3028

31-
## mail-parser on Web
29+
30+
# Apache 2 Open Source License
31+
mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
32+
33+
If you want support the project:
34+
35+
36+
[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif "Donate")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)
37+
38+
![Bitcoin Donate](https://i.stack.imgur.com/MnQ6V.png)
39+
40+
![](https://github.com/SpamScope/mail-parser/raw/develop/docs/bitcoin-qrcode.png)
41+
42+
43+
# mail-parser on Web
3244
- [Splunk app](https://splunkbase.splunk.com/app/4129/)
3345
- [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)
3446
- [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)
3547

3648

37-
## Description
49+
# Description
3850

39-
mail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of
51+
mail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of
4052
[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):
4153

4254
- bcc
@@ -107,27 +119,18 @@ $ mail.to_raw (raw header)
107119

108120
The command line tool use the JSON format.
109121

110-
### Defects
122+
## Defects
111123
These defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).
112124
This library can take these epilogues.
113125

114126

115-
### Apache 2 Open Source License
116-
mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
117-
118-
If you want support the project:
119-
120-
121-
[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif "Donate")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=VEPXYP745KJF2)
122-
123-
124-
## Authors
127+
# Authors
125128

126-
### Main Author
129+
## Main Author
127130
**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)
128131

129132

130-
## Installation
133+
# Installation
131134

132135
Clone repository
133136

@@ -149,7 +152,7 @@ or use `pip`:
149152
$ pip install mail-parser
150153
```
151154

152-
## Usage in a project
155+
# Usage in a project
153156

154157
Import `mailparser` module:
155158

@@ -196,7 +199,7 @@ It's possible to write the attachments on disk with the method:
196199
mail.write_attachments(base_path)
197200
```
198201

199-
## Usage from command-line
202+
# Usage from command-line
200203

201204
If you installed mailparser with `pip` or `setup.py` you can use it with command-line.
202205

@@ -216,7 +219,7 @@ optional arguments:
216219
-s STRING, --string STRING
217220
Raw email string (default: None)
218221
-k, --stdin Enable parsing from stdin (default: False)
219-
-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
222+
-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
220223
Set log level (default: WARNING)
221224
-j, --json Show the JSON of parsed mail (default: False)
222225
-b, --body Print the body of mail (default: False)
@@ -253,11 +256,11 @@ $ mailparser -f example_mail -j
253256

254257
This example will show you the tokenized mail in a JSON pretty format.
255258

256-
From [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to
259+
From [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to
257260
[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).
258261

259262

260-
## Exceptions
263+
# Exceptions
261264

262265
Exceptions hierarchy of mail-parser:
263266

docs/bitcoin-qrcode.png

477 Bytes
Loading

mailparser/const.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,16 +65,18 @@
6565
r'envelope-from|\s*[(]?envelope-sender|\s+'
6666
r'from|\s+by|\s+id|\s+for|\s+with(?! cipher)|;))'
6767
),
68-
6968
# assumes emails are always inside <>
7069
r'(?:envelope-from\s+<(?P<envelope_from>.+?)>)',
7170
r'(?:envelope-sender\s+<(?P<envelope_sender>.+?)>)',
7271

7372
# datetime comes after ; at the end
7473
r';\s*(?P<date>.*)',
75-
74+
7675
# sendgrid datetime
77-
r'(?P<date>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{9} \+0000 UTC) m=\+\d+\.\d+'
76+
(
77+
r'(?P<date>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:'
78+
r'\d{2}\.\d{9} \+0000 UTC) m=\+\d+\.\d+'
79+
)
7880
]
7981

8082
RECEIVED_COMPILED_LIST = [

mailparser/mailparser.py

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
msgconvert,
4343
ported_open,
4444
ported_string,
45+
random_string,
4546
receiveds_parsing,
4647
write_attachments,
4748
)
@@ -353,14 +354,31 @@ def parse(self):
353354
charset = p.get_content_charset('utf-8')
354355
charset_raw = p.get_content_charset()
355356
log.debug("Charset {!r} part {!r}".format(charset, i))
357+
content_disposition = ported_string(
358+
p.get('content-disposition'))
359+
log.debug("content-disposition {!r} part {!r}".format(
360+
content_disposition, i))
356361
content_id = ported_string(p.get('content-id'))
357362
log.debug("content-id {!r} part {!r}".format(
358363
content_id, i))
359-
filename = decode_header_part(
360-
p.get_filename("{}".format(content_id)))
364+
content_subtype = ported_string(p.get_content_subtype())
365+
log.debug("content subtype {!r} part {!r}".format(
366+
content_subtype, i))
367+
filename = decode_header_part(p.get_filename())
361368

362-
# this is an attachment
369+
is_attachment = False
363370
if filename:
371+
is_attachment = True
372+
else:
373+
if content_id and content_subtype not in ('html', 'plain'):
374+
is_attachment = True
375+
filename = content_id
376+
elif content_subtype in ('rtf'):
377+
is_attachment = True
378+
filename = "{}.rtf".format(random_string())
379+
380+
# this is an attachment
381+
if is_attachment:
364382
log.debug("Email part {!r} is an attachment".format(i))
365383
log.debug("Filename {!r} part {!r}".format(filename, i))
366384
binary = False
@@ -412,8 +430,23 @@ def parse(self):
412430
# this isn't an attachments
413431
else:
414432
log.debug("Email part {!r} is not an attachment".format(i))
415-
payload = ported_string(
416-
p.get_payload(decode=True), encoding=charset)
433+
434+
# Get the payload using get_payload method with decode=True
435+
# As Python truly decodes only 'base64',
436+
# 'quoted-printable', 'x-uuencode',
437+
# 'uuencode', 'uue', 'x-uue'
438+
# And for other encodings it breaks the characters so
439+
# we need to decode them with encoding python is appying
440+
# To maintain the characters
441+
payload = p.get_payload(decode=True)
442+
cte = p.get('Content-Transfer-Encoding')
443+
if cte:
444+
cte = cte.lower()
445+
if not cte or cte in ['7bit', '8bit']:
446+
payload = payload.decode('raw-unicode-escape')
447+
else:
448+
payload = ported_string(payload, encoding=charset)
449+
417450
if payload:
418451
if p.get_content_subtype() == 'html':
419452
self._text_html.append(payload)

mailparser/utils.py

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -101,19 +101,19 @@ def ported_string(raw_data, encoding='utf-8', errors='ignore'):
101101
return six.text_type()
102102

103103
if isinstance(raw_data, six.text_type):
104-
return raw_data.strip()
104+
return raw_data
105105

106106
if six.PY2:
107107
try:
108-
return six.text_type(raw_data, encoding, errors).strip()
108+
return six.text_type(raw_data, encoding, errors)
109109
except LookupError:
110-
return six.text_type(raw_data, "utf-8", errors).strip()
110+
return six.text_type(raw_data, "utf-8", errors)
111111

112112
if six.PY3:
113113
try:
114-
return six.text_type(raw_data, encoding).strip()
114+
return six.text_type(raw_data, encoding)
115115
except (LookupError, UnicodeDecodeError):
116-
return six.text_type(raw_data, "utf-8", errors).strip()
116+
return six.text_type(raw_data, "utf-8", errors)
117117

118118

119119
def decode_header_part(header):
@@ -141,7 +141,7 @@ def decode_header_part(header):
141141
log.error("Failed decoding header part: {}".format(header))
142142
output += header
143143

144-
return output
144+
return output.strip()
145145

146146

147147
def ported_open(file_):
@@ -290,7 +290,23 @@ def parse_received(received):
290290
if len(values_by_clause) == 0:
291291
# we weren't able to match anything...
292292
msg = "Unable to match any clauses in %s" % (received)
293-
log.error(msg)
293+
294+
# Modification #1: Commenting the following log as
295+
# this raised exception is caught above and then
296+
# raw header is updated in response
297+
# We dont want to get so many errors in our error
298+
# logger as we are not even trying to parse the
299+
# received headers
300+
# Wanted to make it configurable via settiings,
301+
# but this package does not depend on django and
302+
# making configurable setting
303+
# will make it django dependent,
304+
# so better to keep it working with only python
305+
# dependent and on any framework of python
306+
# commenting it just for our use
307+
308+
# log.error(msg)
309+
294310
raise MailParserReceivedParsingError(msg)
295311
return values_by_clause
296312

@@ -468,7 +484,7 @@ def get_header(message, name):
468484
headers = [decode_header_part(i) for i in headers]
469485
if len(headers) == 1:
470486
# in this case return a string
471-
return headers[0]
487+
return headers[0].strip()
472488
# in this case return a list
473489
return headers
474490
return six.text_type()
@@ -551,7 +567,6 @@ def write_sample(binary, payload, path, filename): # pragma: no cover
551567
"""
552568
if not os.path.exists(path):
553569
os.makedirs(path)
554-
555570
sample = os.path.join(path, filename)
556571

557572
if binary:

mailparser/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
limitations under the License.
1818
"""
1919

20-
__version__ = "3.14.0"
20+
__version__ = "3.15.0"
2121

2222
if __name__ == "__main__":
2323
print(__version__)

setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@
6464
"Programming Language :: Python :: 3.5",
6565
"Programming Language :: Python :: 3.6",
6666
"Programming Language :: Python :: 3.7",
67+
"Programming Language :: Python :: 3.8",
68+
"Programming Language :: Python :: 3.9",
6769
],
6870
install_requires=requires,
6971
entry_points={'console_scripts': [

tests/test_mail_parser.py

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -200,14 +200,14 @@ def test_fingerprints_body(self):
200200
mail = mailparser.parse_from_file(mail_test_1)
201201
md5, sha1, sha256, sha512 = fingerprints(
202202
mail.body.encode("utf-8"))
203-
self.assertEqual(md5, "1bbdb7dcf511113bbc0c1b214aeac392")
204-
self.assertEqual(sha1, "ce9e62b50fa4e2168278880b14460b905b24eb4b")
205-
self.assertEqual(sha256, ("1e9b96e3f1bc74702f9703391e8ba0715b849"
206-
"7127a7ff857013ab33385898574"))
207-
self.assertEqual(sha512, ("ad858f7b5ec5549e55650fd13df7683e403489"
208-
"77522995851fb6b625ac54744cf3a4bf652784"
209-
"dba971ef99afeec4e6caf2fdd10be72eabb730"
210-
"c312ffbe1c4de3"))
203+
self.assertEqual(md5, "55852a2efe95e7249887c92cc02123f8")
204+
self.assertEqual(sha1, "62fef1e38327ed09363624c3aff8ea11723ee05f")
205+
self.assertEqual(sha256, ("cd4af1017f2e623f6d38f691048b6"
206+
"a28d8b1f44a0478137b4337eac6de78f71a"))
207+
self.assertEqual(sha512, ("4a573c7929b078f2a2c1c0f869d418b0c020d4"
208+
"d37196bd6dcc209f9ccb29ca67355aa5e47b97"
209+
"c8bf90377204f59efde7ba1fc071b6f250a665"
210+
"72f63b997e92e8"))
211211

212212
def test_fingerprints_unicodeencodeerror(self):
213213
mail = mailparser.parse_from_file(mail_test_7)
@@ -456,7 +456,7 @@ def test_parse_from_file_msg(self):
456456
m = mailparser.parse_from_file_msg(mail_outlook_1)
457457
email = m.mail
458458
self.assertIn("attachments", email)
459-
self.assertEqual(len(email["attachments"]), 5)
459+
self.assertEqual(len(email["attachments"]), 6)
460460
self.assertIn("from", email)
461461
self.assertEqual(email["from"][0][1], "NueblingV@w-vwa.de")
462462
self.assertIn("subject", email)
@@ -564,11 +564,7 @@ def test_ported_string(self):
564564
s = ported_string(raw_data)
565565
self.assertEqual(s, six.text_type())
566566

567-
raw_data = "test "
568-
s = ported_string(raw_data)
569-
self.assertEqual(s, "test")
570-
571-
raw_data = u"test "
567+
raw_data = u"test"
572568
s = ported_string(raw_data)
573569
self.assertEqual(s, "test")
574570

@@ -671,5 +667,6 @@ def test_write_uuencode_attachment(self):
671667
shutil.rmtree(temp_dir)
672668
self.assertEqual(md5.hexdigest(), '4f2cf891e7cfb349fca812091f184ecc')
673669

670+
674671
if __name__ == '__main__':
675672
unittest.main(verbosity=2)

tox.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[tox]
2-
envlist = begin, py27, py37, end
2+
envlist = begin, py27, py39, end
33

44
[testenv:begin]
55
commands = coverage erase

0 commit comments

Comments
 (0)