Skip to content

email.message_from_bytes heavy memory use #115512

Open
@cnicodeme

Description

@cnicodeme

Bug report

Bug description:

Hi!

Investigating some memory issues on my lamdba, I discovered an odd usage coming from email.message_from_bytes

When opening an .eml that contains close to no text but a 30Mb attachment, the memory usage jumps to +238Mb !
9 times the size of the file!!

Here's what was my tests:

from email import message_from_bytes
import resource

print('Init ram: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))

data = None
with open('file.eml', 'rb') as f:
    data = f.read()

print('File loaded: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))
print('    (file size: {}kb)'.format(len(data) / 1024))

mail = message_from_bytes(data)

print('After message_from_bytes: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))

And the output:

Init ram: 7168kb
File loaded: 37120kb
    (file size: 29900kb)
After message_from_bytes: 279296kb

The EML in question contains an attachment (a CSV file) encoded in Base64. I suspect that BytesParser is converting that content to binary data, but I find it surprising that doing this takes 9 times the filesize.
Wouldn't it be faster and more efficient to convert that only when accessing, and having a way to not convert it at all (getting it raw, in base64) ?

(Maybe there is already and I missed it?)

I tested this in:

  • Python 3.10.13
  • Python 3.12.1

And got the same results.

CPython versions tested on:

3.10

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions