Open
Description
Bug report
Bug description:
Hi!
Investigating some memory issues on my lamdba, I discovered an odd usage coming from email.message_from_bytes
When opening an .eml that contains close to no text but a 30Mb attachment, the memory usage jumps to +238Mb !
9 times the size of the file!!
Here's what was my tests:
from email import message_from_bytes
import resource
print('Init ram: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))
data = None
with open('file.eml', 'rb') as f:
data = f.read()
print('File loaded: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))
print(' (file size: {}kb)'.format(len(data) / 1024))
mail = message_from_bytes(data)
print('After message_from_bytes: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))
And the output:
Init ram: 7168kb
File loaded: 37120kb
(file size: 29900kb)
After message_from_bytes: 279296kb
The EML in question contains an attachment (a CSV file) encoded in Base64. I suspect that BytesParser
is converting that content to binary data, but I find it surprising that doing this takes 9 times the filesize.
Wouldn't it be faster and more efficient to convert that only when accessing, and having a way to not convert it at all (getting it raw, in base64) ?
(Maybe there is already and I missed it?)
I tested this in:
- Python 3.10.13
- Python 3.12.1
And got the same results.
CPython versions tested on:
3.10
Operating systems tested on:
Linux