Skip to content

gh-83461: Don't allow datetime parsing to accept non-ASCII digits #131008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

StanFromIreland
Copy link
Contributor

@StanFromIreland StanFromIreland commented Mar 9, 2025

@picnixz picnixz added the type-security A security issue label Mar 9, 2025
@picnixz
Copy link
Member

picnixz commented Mar 9, 2025

The original issue was marked with a security bug but I haven't looked at the entire thread so we might consider it a simple bug fix

@StanFromIreland
Copy link
Contributor Author

@picnixz

No it needs to be relabeled:

-versions -type-security +type-bug

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to update the C implementation?

@picnixz picnixz removed the type-security A security issue label Mar 9, 2025
@StanFromIreland
Copy link
Contributor Author

C calls the Python implementation.

Co-authored-by: Bénédikt Tran <[email protected]>
@StanFromIreland StanFromIreland requested a review from picnixz March 9, 2025 12:27
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder but what about languages for which their input has non-ASCII strings such as Japanese?

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ASCII flag restriction is too broad here as it applies to all formats, not just the digit part.

@StanFromIreland StanFromIreland requested a review from picnixz March 9, 2025 20:19
picnixz
picnixz previously requested changes Mar 9, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please carefully address the following suggestions and please make sure that typos lines are correctly formatted.

@bedevere-app
Copy link

bedevere-app bot commented Mar 9, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Mar 9, 2025

Note: I'll review more this PR once I'm back because it will be easier when I'm on a laptop and not on mobile (so on Wednesday/Thursday)

@picnixz picnixz self-requested a review March 9, 2025 20:57
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit and LGTM (ok on my side).

@StanFromIreland
Copy link
Contributor Author

Anything else left to do @pganssle ?

@bedevere-app
Copy link

bedevere-app bot commented Mar 23, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@StanFromIreland
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Mar 23, 2025

Thanks for making the requested changes!

@pganssle, @picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested review from pganssle and picnixz March 23, 2025 15:28
@pganssle
Copy link
Member

pganssle commented Mar 26, 2025

I think my comments on how to organize the documentation were not clear enough, so rather than another round of back-and-forth, I went ahead and pushed some changes to the documentation directly.

Also, I realized that the summary of "reject non-ASCII digits" isn't quite right, because there are actually locales where codes like %c and %X will generate strings with non-ASCII digits, e.g.:

>>> from datetime import datetime
>>> import locale; locale.setlocale(locale.LC_ALL, 'fa_IR.utf8')
>>> print(datetime.now().strftime('%x'))
۲۵/۰۳/۲۶
>>> print(datetime.now().strftime("c"))
چهارشنبه ۲۶ مارس ۲۵، ۱۰:۳۷:۵۱

And in fact in 3.14 we fixed strptime to properly parse these, see #125406.

TBH, I'm a little apprehensive about so widely advertising this change, because part of the rationale for doing it is that support for this kind of thing was patchy in the first place, so it's better for it to either all work or not work at all. Fixing a bug like that doesn't seem like the kind of thing that needs a What's New entry and a permanent log of the change in the datetime documentation on the same level as API changes like changes to the function prototype.

Given that we're trying to improve locale support in other ways, I could imagine us adding support for these things more explicitly in the future, so I wonder if we want to set the expectation that this will never work.

@StanFromIreland @picnixz Would one or both of you mind taking a look at my changes and seeing if you agree?

@serhiy-storchaka As the person who has been working on the locale improvements, any thoughts on this subject?

@serhiy-storchaka
Copy link
Member

I think that formats like %Y should only accept ASCII digits, but %OY and %EY should accept non-ASCII digits (the set of accepted digits can depend on the locale). The O and E prefixes are not yet officially supported in Python. strftime() may support them if the underlying C function support them (they are a part of the C and Posix standards now, but this depends on the version and platform). strptime() is implemented in Python and doesn't officially support them, but I added unofficial support for some directives).

Currently, a heuristic is used to determine the format for %c, %x and %X in terms of simpler directives. I added (unofficially) support of the O prefix to support non-ASCII digits and plan to disallow support of non-ASCII digits in directives without the O or E prefix, but this is a change which cannot be backported. I have not yet finished with changes which can be backported.

First, we need to add official support of the O and E prefixes (even if it is platform dependent). We need to use nl_langinfo() for this. Then we can disallow non-ASCII digits for directives without these prefixes.

Actually, I started to work on strptime() and nl_langinfo() issues after seen this issue. Most of the work is already done, but please wait a little longer.

@picnixz
Copy link
Member

picnixz commented Apr 1, 2025

Also, I realized that the summary of "reject non-ASCII digits" isn't quite right, because there are actually locales where codes like %c and %X will generate strings with non-ASCII digits, e.g.:

Oh I wasn't aware of this so thank you for the corrections.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in Lib/_striptime.py, you can add a small comment saying that the O* formats are locale-specific? (just above `for d in 'dmyHIMS'?

@@ -563,6 +563,9 @@ Functions
When used with the :func:`strptime` function, ``%U`` and ``%W`` are only used in
calculations when the day of the week and the year are specified.

(5)
The :func:`strptime` function does not accept non-ASCII digits for numeric values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the "non-locale-specific numeric format codes" here or, since it's not officially supported, we can be a bit lazy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle Do you want to do this too? Or should I?

@StanFromIreland
Copy link
Contributor Author

@pganssle friendly ping :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants