-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-118716: Do not enforce ASCII to format address #118717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
You need to update this test: cpython/Lib/test/test_email/test_email.py Lines 3241 to 3245 in 26bab42
|
The commit that added this test pre-dates that rfc: 8debacb. |
IMO the legacy API (which is what you are modifying) should not support utf-8 addresses, since it provides no mechanisms for correctly interoperating with smtplib the way the new API does. Therefore I recommend rejecting this PR. See my comments on the issue for more. |
Could you add a news entry? https://blurb-it.herokuapp.com/add_blurb |
Lib/email/utils.py
Outdated
@@ -83,8 +83,6 @@ def formataddr(pair, charset='utf-8'): | |||
'utf-8'. | |||
""" | |||
name, address = pair | |||
# The address MUST (per RFC) be ascii, so raise a UnicodeError if it isn't. | |||
address.encode('ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I object to this change. I don't think the legacy API should be modified.
@@ -1214,14 +1214,14 @@ def test_get_local_part_valid_and_invalid_qp_in_atom_list(self): | |||
'@example.com') | |||
self.assertEqual(local_part.local_part, r'\example\\ example') | |||
|
|||
def test_get_local_part_unicode_defect(self): | |||
def test_get_local_part_unicode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not, IMO, the best solution. But fixing it right is non-trivial. The issue here is that this IS a defect if policy.utf8=False, and that should be maintained. The problem is that currently the header value parser does not have access to the policy. Which was a design mistake.
The reason you want to maintain the old behavior (aside from backward compatibility, which definitely counts) is that the parser is designed to be used in multiple scenarios, and one of those scenarios might be a place were we want to be emulating a non-SMTPUTF8 parser and looking at where a parsed message does not pass muster.
Now, that said, I'm not seeing any good way to fix this right. I think the best compromise might be to leave this defect alone, but to change the Address object so that it specifically ignores this defect (does not raise an error in that case). The smtp library should be raising an error if an attempt is made to actually send using a utf8 address to a server that does not support SMTPUTF8. With this compromise backward compatibility will be maintained, while still allowing utf8 addresses to be specified using Address (which should only be used by library consumers, not the library itself). I believe I am correct that such addresses will not otherwise be rejected by the email library.
This leaves a dangling issue that if we parse a message using utf8=True we technically should not get a defect, but unless and until we teach the header value parser about the policy we can't really fix that.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…r_spec RFC 6530 allows UTF-8 strings in the local part.
ac72c64
to
2bbb200
Compare
I have made the requested changes; please review again |
Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. |
RFC 6530 allows UTF-8 strings