-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-67022: Document bytes/str inconsistency in email.header.decode_header() and add .decode_header_to_string() as a sane alternative #92900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think this would help users of the legacy API, although I think we should also steer people to the new API. What does @bitdancer think?
.. note:: | ||
|
||
This function exists for for backwards compatibility only. For | ||
new code we recommend using :mod:`email.header.decode_header_to_string`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a link to the non-legacy API, or an example using that newer API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the non-legacy API as described in https://docs.python.org/3/library/email.html, e.g. email.parser
?
To my knowledge, there is not any function/method in that API which can be straightforwardly used instead of email.header.decode_header
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> from email.headerregistry import HeaderRegistry
>>> decoder = HeaderRegistry()
>>> decoder('To', '=?utf-8?q?M=C3=A4x?= <[email protected]>')
'Mäx <[email protected]>'
>>> decoder('To', '=?utf-8?q?M=C3=A4x?= <[email protected]>').addresses
(Address(display_name='Mäx', username='foo', domain='bar.com'),)
You really don't want to use the legacy decode_header. It has many bugs that the new API fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this needs to be better documented...
Misc/NEWS.d/next/Library/2022-01-11-21-40-14.bpo-22833.WB-JWw.rst
Outdated
Show resolved
Hide resolved
…de_header() This function's possible return types have been surprising and error-prone for the entirety of its Python 3.x history. It can return either: 1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1 2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1 This means that any user of this function must be prepared to accept either `bytes` or `str` for the first member of the 2-tuples it returns, which is a very surprising behavior in Python 3.x, particularly given that the second member of the tuple is supposed to represent the charset/encoding of the first member. This patch documents the behavior of this function, and adds test cases to demonstrate it. As discussed in bpo-22833, this cannot be changed in a backwards-compatible way, and some users of this function depend precisely on the existing behavior.
This function takes an email header, possibly with portions encoded according to RFC2047, and converts it to a standard Python string. It is intended to provide a sane, Pythonic replacement for `email.header.decode_header()`, which has two major problems: 1. May return either bytes or str (bpo-22833/pythongh-67022), an inconsistent and error-prone interface 2. Exposes details of an email header value's encoding which most users will not care about or want to deal with. Many users likely just want to decode an email header value to a Python string. It turns out that `email.header` already contained most of the code necessary to do this, and providing `decode_header_to_string` as a documented wrapper function points users in the right direction.
I have made the requested changes; please review again, @warsaw.
😂 |
I have made the requested changes; please review again |
This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:
This function can't be rewritten to be more consistent in a backwards-compatible way, because some users of this function depend on the existing return type(s).
This PR addresses the inconsistency as suggested by @JelleZijlstra in #67022 (comment):
The "sane", Pythonic way to handle the decoding of an email/MIME message header value is simply to convert the whole header to a
str
; the details of exactly which parts of that header were encoded in which charsets are not relevant to the users. Fortunately, theemail.header
module already contains a mechanism to do this, via the__str__
method ofemail.header.header
, so we can simply create a wrapper function to guide users in the right direction.Example of the old/inconsistent (
decode_header
) vs. new/sane (decode_header_to_string
) functions:(Closes #30548 and replaces it.)