Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept __getitem__-iterables in builtins.enumerate #12294

Closed
wants to merge 6 commits into from

Conversation

jorenham
Copy link
Contributor

@jorenham jorenham commented Jul 9, 2024

I noticed that the following example isn't accounted for in the builtins stubs, and currently rejected by typecheckers (although I only checked with pyright):

>>> class Ints:
...     def __init__(self, n: int, /):
...         self._n = n
...     def __getitem__(self, i: int, /) -> int:
...         if 0 <= i < self._n:
...             return i
...         raise IndexError
... 
>>> list(enumerate(Ints(4)))
[(0, 0), (1, 1), (2, 2), (3, 3)]

This comment has been minimized.

This comment has been minimized.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Jul 9, 2024

See also #7813 / #8597, nothing specific with builtins, it's just legacy behaviour of iteration in Python 2.1 and earlier.

@jorenham
Copy link
Contributor Author

jorenham commented Jul 9, 2024

Ok I wasn't aware of the earlier discussion about this.
But what exactly makes this "legacy behaviour", as opposed to a feature?

And why should it only be allowed for iter, but not also in tuple and list?
I wouldn't mind changing this PR to only those two, for instance.

@JelleZijlstra
Copy link
Member

I think my comment here explains why we don't need this: #7813 (comment).

The mypy-primer output on this PR also does not suggest this change makes type checking better for users.

@jorenham
Copy link
Contributor Author

jorenham commented Jul 9, 2024

@JelleZijlstra I agree that it wouldn't be a good idea to replace Iterable with the _Iterand in this PR.

But my (revised) suggestion is to only change the constructor signatures of list and tuple (i.e. by modifying this PR, or opening a new one).

It's a subtle change, and avoids their constructors from being unnecessarily restrictive.
ANd since the iter signature has already been fixed, I don't see why we shouldn't fix it for tuple and list as well 🤷🏻.

@JelleZijlstra
Copy link
Member

I don't really see why list() and tuple() would need special casing. Several of the concerns in the linked thread still apply (potential for more confusing type checking errors and performance regressions in type checkers).

@jorenham
Copy link
Contributor Author

So am I correct to understand (and please do correct me if I'm wrong) that, after considering that

  1. there are currently several false negatives in the builtins stubs that affect all typecheckers,
  2. a solution to this problem was already successfully applied to iter a while ago, and
  3. this PR demonstrates that the same solution can easily be applied to fix the other issues,
  4. but fixing these issues could make some type-checker errors more confusing, and
  5. could potentially make typesheckers a bit slower,

you decide that the issues (1-3) addressed in this PR are considered to be less of a problem than the downsides (4-5)?

I understand that there probably aren't many usecases for the "getitem-iterators".

But I'm a bit of a typing purist, and always considered my "typing utopia" to be in perfect harmony with the runtime, i.e. with noo false-positives (i.e. broad annotations like Any),
and no false-negatives (i.e. all that's allowed at runtime, is allowed while type-checking) 🏖️.

@JelleZijlstra
Copy link
Member

this PR demonstrates that the same solution can easily be applied to fix the other issues

The failing CI and many new mypy-primer errors say otherwise. I'm not necessarily opposed to making a change here if it demonstrably makes the situation better for users, but this PR as it stands is not that.

@hauntsaninja
Copy link
Collaborator

(As the person who added the iter case, I didn't really intend for it to proliferate. The nice thing about allowing the broader type for iter specifically is now there's an easy solution for anyone forced to iterate over some old-style iterable: just wrap it in iter(...) and go about your day, no need to slippery slope all the way to thrashing every use of Iterable in the ecosystem. And yes, as Jelle says, the case for slipping down the slope a little bit is especially weak if it hurts current users)

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@jorenham
Copy link
Contributor Author

Ok I realize now that this isn't gonna work for the builtin collection types, because that could be incompatible with their user-defined subtypes.

But @hauntsaninja 's logic regarding iter also applies to builtin functions like enumerate.
And at least for enumerate, this doesn't appear to trigger mypy.

Although the mypy primer makes the earlier argument about confusing error messages more clear.
But perhaps this can be circumvented by using a separate @overload 🤔.

@jorenham jorenham changed the title Consider both Iterable and _GetItemIterable as "iterand" types in builtins Accept __getitem__-iterables in builtins.enumerate Jul 10, 2024

This comment has been minimized.

@jorenham
Copy link
Contributor Author

Oh haha nevermind.
I these cases pyright only shows the annotation of the overload that is the closest match, so I expected mypy do do something similar, but alas.

…rable` for prevent cluttered error messages"

This reverts commit 37c95e6.
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

discord.py (https://github.com/Rapptz/discord.py)
- discord/abc.py:542: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never]"  [arg-type]
+ discord/abc.py:542: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never] | _GetItemIterable[Never]"  [arg-type]

cki-lib (https://gitlab.com/cki-project/cki-lib)
- tests/test_yaml.py:169: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never]"  [arg-type]
+ tests/test_yaml.py:169: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never] | _GetItemIterable[Never]"  [arg-type]

materialize (https://github.com/MaterializeInc/materialize)
- misc/python/materialize/cli/cloudbench.py:175: error: Argument 1 to "enumerate" has incompatible type "BenchSuccessResult | BenchFailureLogs | None"; expected "Iterable[str]"  [arg-type]
+ misc/python/materialize/cli/cloudbench.py:175: error: Argument 1 to "enumerate" has incompatible type "BenchSuccessResult | BenchFailureLogs | None"; expected "Iterable[str] | _GetItemIterable[str]"  [arg-type]

@jorenham
Copy link
Contributor Author

Would it be accepteble like this @JelleZijlstra?

And if so, what are you thoughts on doing this as well for zip, sorted, all/any and maybe also min/max/sum?

I don't think that it'll work for map and filter, as they are non-final types, so have the same issue as the collection types.
For reversed the same mypy bug that's blocking #11646 also applies, so for now that's not going to work either.

@JelleZijlstra
Copy link
Member

Thinking about it more, I don't see much point in changing this for just a few specific iterators. It's going to be confusing if "getitem iterables" are accepted in the type system by some but not all functions. It makes sense for iter() to be special because it is specifically for turning iterables into iterators, but enumerate() isn't special in the same way.

There's also the issue @TeamSpen210 brought up on the previous thread: with the union, some types (e.g., dict[int, T]) can match both union members, and it becomes ambiguous which one will be picked by type checkers. This could in theory lead to wrong types being inferred.

@srittau
Copy link
Collaborator

srittau commented Jul 11, 2024

As others have pointed out, the old-style iteration protocol isn't currently supported by typeshed, except in the case of iter(). Introducing it would have a far-reaching ripple effect for what is effectively a legacy feature that's easily fixed or worked around. While there was a good reason for adding support to iter() (to provide an easy way to adapt from the old-style protocol to the "new"-style protocol), allowing it in other places would be a step back.

@srittau srittau closed this Jul 11, 2024
@jorenham
Copy link
Contributor Author

jorenham commented Jul 11, 2024

I still don't understand what exactly makes this a "legacy feature", or "old-style".
As far as I know, it isn't deprecated?

@jorenham
Copy link
Contributor Author

And @JelleZijlstra that union issue you mention doesn't apply here, as the union is used to annotate a function parameter, and not passed as type argument to a covariant type parameter.

@srittau
Copy link
Collaborator

srittau commented Jul 11, 2024

I still don't understand what exactly makes this a "legacy feature", or "old-style".

Because a better alternative has been introduced 23 years ago and __getitem__() iteration has multiple issues and doesn't gel well with typing. Please accept that the typeshed maintainers have no interest in implementing this feature.

@JelleZijlstra
Copy link
Member

that union issue you mention doesn't apply here, as the union is used to annotate a function parameter, and not passed as type argument to a covariant type parameter

Why do you think that matters? I haven't tried it but I believe your PR would make type checkers incorrectly accept x: dict[int, str]; y: Iterable[tuple[int, str]] = enumerate(x).

@jorenham
Copy link
Contributor Author

So now that there aren't any technical obstacles left, would you care to reconsider?

@JelleZijlstra
Copy link
Member

I'm not interested in making this change; I don't think the increase in complexity is worth it. Another maintainer can speak up and merge a version of this change if they have a different view; I won't mind.

@jorenham
Copy link
Contributor Author

Ok that's fine. Could you re-open it then? Or should I make a new PR?

@AlexWaygood
Copy link
Member

Could you re-open it then?

it doesn't seem like you've persuaded any maintainers yet that this change is worth it. If any of us is persuaded at any point, rest assured we will not hesitate to reopen this PR.

@jorenham
Copy link
Contributor Author

I'm rather surprised that this bugfix, which at this point has no technical objections, requires persuasion at all. Additionally, it was closed for reasons that don't apply anymore, yet re-opening it is out of the question because "I haven't persuaded any maintainers yet".

Shouldn't it be the other way around? A bug should only remain in the code if there's a good reason for it.
And I've heard good reasons, but at this point they don't apply anymore.

The only remaining objection is that of the "confusing error messages". But that's the responsibility of type-checkers. The responsibility of typing stubs is to provide type hints that are correct and complete.
But I'm sure that there are cases where blurred these lines in unavoidable, so realistically the term "responsibility" should be replaced with something like "priority".
And even so, my point still stands.

And yes, this is in fact a fix for a bug, not just a "change": Currently type-checkers falsely reject valid usage of enumerate.

That should be more than enough reason to, at the very least, re-open this PR.
Otherwise, exactly which points remain that require more explanation?
Or perhaps starting a fresh PR is the way to go here?

@AlexWaygood
Copy link
Member

Or perhaps starting a fresh PR is the way to go here?

I think you'll find that to be counterproductive when it comes to convincing us

@Akuli
Copy link
Collaborator

Akuli commented Jul 19, 2024

Do you have a practical example of why this change would be useful to you? What is the type that you want to enumerate over and only has __getitem__?

I agree with other maintainers that this is a slippery slope: if we change enumerate() to accept __getitem__-only iterables, then over time, more and more things get extended to accept them and Iterable becomes useless. We must say no at some point.

@jorenham
Copy link
Contributor Author

Do you have a practical example of why this change would be useful to you? What is the type that you want to enumerate over and only has __getitem__?

The first example that comes to mind is ctypes.pointer.

if we change enumerate() to accept __getitem__-only iterables, then over time, more and more things get extended to accept them and Iterable becomes useless.

I don't see this would make Iterable useless:
Just like there are types that implement __getitem__ (for int keys) but no __iter__, there are also types that implement __iter__ but not __getitem__.
In case of both, the __iter__ signature should be considered first (e.g. using overloads), which is not unlike how __add__ precedes __radd__, so it shouldn't be difficult to understand for developers.

Which brings me to another advantage of not having to define an __iter__ if not needed:
If runtime-protocols for sequence-like types don't require an __iter__ anymore, then their (initial) isinstance checks will be faster.

@Akuli
Copy link
Collaborator

Akuli commented Jul 22, 2024

The first example that comes to mind is ctypes.pointer.

I think ctypes is a good example. With enumerate(), you can presumably do some things that would be done with pointer arithmetic in C. I'm not very familiar with ctypes, but I can see how it would be useful.

How do you use enumerate() here though? A pointer does not know where the memory region ends, so do you just consume the first few elements of the enumerate() iterator and then throw it away? Does this mean that allowing __getitem__ iterables would be more useful for enumerate() than list()?

If runtime-protocols for sequence-like types don't require an __iter__ anymore, then their (initial) isinstance() checks will be faster.

This doesn't work. A dict[int, str] shouldn't be a valid Sequence[str] only because you canindex it with integers. (This doesn't create a bug in this PR as long as the __iter__ overload is first, as it should be anyway.) I also tend to be very sceptical when someone claims something affects performance, because in my experience, bottlenecks are almost always in surprising places that can only be found by measuring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants