Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-57095: Add note about input splitting in datetime.*.strptime #131049

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

StanFromIreland
Copy link
Contributor

@StanFromIreland StanFromIreland commented Mar 10, 2025

@StanFromIreland
Copy link
Contributor Author

Maybe @abalkin should be removed from the CODEOWNERS and marked (inactive) in the devguide experts page? What do you think @pganssle , he has not been active for quite a few years.

@encukou
Copy link
Member

encukou commented Mar 11, 2025

I don't think the behaviour is guaranteed. Can you find it some standard?

IMO, it would be better to expand the note about what varies across platforms -- “The full set of format codes”, “handling of unsupported format specifiers”... and also handling of ambiguous/wrong inputs.

(Even if all platforms currently agree, I don't think we should commit to their behaviour. As musl shows, there can always appear a new platform that considers it fair game to reinterpret whatever's not specified in C or POSIX.)

@pganssle
Copy link
Member

Maybe @abalkin should be removed from the CODEOWNERS and marked (inactive) in the devguide experts page? What do you think @pganssle , he has not been active for quite a few years.

He can choose to do that if he wants, I think the only thing it is hurting is his inbox 😅

WRT the change, I needed a close look to understand what this meant, which makes me think that this is not a very clear statement. It seems like what you are trying to say is something about how ambiguous format specs are parsed, correct? For example, if a code can match one digit or two, and matching two digits will fail to parse but matching one digit will succeed, it chooses one?

What if you use %H%M with 111? Does it parse to 11, 1? 1, 11? Fail? How about 131? 071?

I don't think the behaviour is guaranteed. Can you find it some standard?

IMO, it would be better to expand the note about what varies across platforms -- “The full set of format codes”, “handling of unsupported format specifiers”... and also handling of ambiguous/wrong inputs.

I agree in principle. I think if we document this behavior, we should add tests and then maybe add language like, "The behavior when parsing ambiguous strings is platform dependent. On all currently supported platforms, ". We can add a comment to the test to change the documentation if the test starts failing on a supported platform because of their implementation of strptime. (Assuming that this is not already clearly specified in the POSIX standard).

@StanFromIreland
Copy link
Contributor Author

Maybe the wording from the GNU docs would be better?

The user has to make sure, though, that the input can be parsed in a unambiguous way. The string "1999112" can be parsed using the format "%Y%m%d" as 1999-1-12, 1999-11-2, or even 19991-1-2. It is necessary to add appropriate separators to reliably get results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants