Accept `getitem`-iterables in `builtins.enumerate` #12294

jorenham · 2024-07-09T00:05:50Z

I noticed that the following example isn't accounted for in the builtins stubs, and currently rejected by typecheckers (although I only checked with pyright):

>>> class Ints:
...     def __init__(self, n: int, /):
...         self._n = n
...     def __getitem__(self, i: int, /) -> int:
...         if 0 <= i < self._n:
...             return i
...         raise IndexError
... 
>>> list(enumerate(Ints(4)))
[(0, 0), (1, 1), (2, 2), (3, 3)]

hauntsaninja · 2024-07-09T00:57:42Z

See also #7813 / #8597, nothing specific with builtins, it's just legacy behaviour of iteration in Python 2.1 and earlier.

jorenham · 2024-07-09T01:32:58Z

Ok I wasn't aware of the earlier discussion about this.
But what exactly makes this "legacy behaviour", as opposed to a feature?

And why should it only be allowed for iter, but not also in tuple and list?
I wouldn't mind changing this PR to only those two, for instance.

JelleZijlstra · 2024-07-09T02:26:04Z

I think my comment here explains why we don't need this: #7813 (comment).

The mypy-primer output on this PR also does not suggest this change makes type checking better for users.

jorenham · 2024-07-09T03:16:04Z

@JelleZijlstra I agree that it wouldn't be a good idea to replace Iterable with the _Iterand in this PR.

But my (revised) suggestion is to only change the constructor signatures of list and tuple (i.e. by modifying this PR, or opening a new one).

It's a subtle change, and avoids their constructors from being unnecessarily restrictive.
ANd since the iter signature has already been fixed, I don't see why we shouldn't fix it for tuple and list as well 🤷🏻.

JelleZijlstra · 2024-07-10T00:30:50Z

I don't really see why list() and tuple() would need special casing. Several of the concerns in the linked thread still apply (potential for more confusing type checking errors and performance regressions in type checkers).

jorenham · 2024-07-10T03:20:09Z

So am I correct to understand (and please do correct me if I'm wrong) that, after considering that

there are currently several false negatives in the builtins stubs that affect all typecheckers,
a solution to this problem was already successfully applied to iter a while ago, and
this PR demonstrates that the same solution can easily be applied to fix the other issues,
but fixing these issues could make some type-checker errors more confusing, and
could potentially make typesheckers a bit slower,

you decide that the issues (1-3) addressed in this PR are considered to be less of a problem than the downsides (4-5)?

I understand that there probably aren't many usecases for the "getitem-iterators".

But I'm a bit of a typing purist, and always considered my "typing utopia" to be in perfect harmony with the runtime, i.e. with noo false-positives (i.e. broad annotations like Any),
and no false-negatives (i.e. all that's allowed at runtime, is allowed while type-checking) 🏖️.

JelleZijlstra · 2024-07-10T05:02:57Z

this PR demonstrates that the same solution can easily be applied to fix the other issues

The failing CI and many new mypy-primer errors say otherwise. I'm not necessarily opposed to making a change here if it demonstrably makes the situation better for users, but this PR as it stands is not that.

hauntsaninja · 2024-07-10T06:04:55Z

(As the person who added the iter case, I didn't really intend for it to proliferate. The nice thing about allowing the broader type for iter specifically is now there's an easy solution for anyone forced to iterate over some old-style iterable: just wrap it in iter(...) and go about your day, no need to slippery slope all the way to thrashing every use of Iterable in the ecosystem. And yes, as Jelle says, the case for slipping down the slope a little bit is especially weak if it hurts current users)

…ter`

stdlib/builtins.pyi

jorenham · 2024-07-10T14:39:28Z

Ok I realize now that this isn't gonna work for the builtin collection types, because that could be incompatible with their user-defined subtypes.

But @hauntsaninja 's logic regarding iter also applies to builtin functions like enumerate.
And at least for enumerate, this doesn't appear to trigger mypy.

Although the mypy primer makes the earlier argument about confusing error messages more clear.
But perhaps this can be circumvented by using a separate @overload 🤔.

…or prevent cluttered error messages

jorenham · 2024-07-10T14:49:43Z

Oh haha nevermind.
I these cases pyright only shows the annotation of the overload that is the closest match, so I expected mypy do do something similar, but alas.

…rable` for prevent cluttered error messages" This reverts commit 37c95e6.

github-actions · 2024-07-10T14:56:30Z

Diff from mypy_primer, showing the effect of this PR on open source code:

discord.py (https://github.com/Rapptz/discord.py)
- discord/abc.py:542: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never]"  [arg-type]
+ discord/abc.py:542: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never] | _GetItemIterable[Never]"  [arg-type]

cki-lib (https://gitlab.com/cki-project/cki-lib)
- tests/test_yaml.py:169: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never]"  [arg-type]
+ tests/test_yaml.py:169: error: Argument 1 to "enumerate" has incompatible type "object"; expected "Iterable[Never] | _GetItemIterable[Never]"  [arg-type]

materialize (https://github.com/MaterializeInc/materialize)
- misc/python/materialize/cli/cloudbench.py:175: error: Argument 1 to "enumerate" has incompatible type "BenchSuccessResult | BenchFailureLogs | None"; expected "Iterable[str]"  [arg-type]
+ misc/python/materialize/cli/cloudbench.py:175: error: Argument 1 to "enumerate" has incompatible type "BenchSuccessResult | BenchFailureLogs | None"; expected "Iterable[str] | _GetItemIterable[str]"  [arg-type]

jorenham · 2024-07-10T15:08:15Z

Would it be accepteble like this @JelleZijlstra?

And if so, what are you thoughts on doing this as well for zip, sorted, all/any and maybe also min/max/sum?

I don't think that it'll work for map and filter, as they are non-final types, so have the same issue as the collection types.
For reversed the same mypy bug that's blocking #11646 also applies, so for now that's not going to work either.

JelleZijlstra · 2024-07-11T03:57:26Z

Thinking about it more, I don't see much point in changing this for just a few specific iterators. It's going to be confusing if "getitem iterables" are accepted in the type system by some but not all functions. It makes sense for iter() to be special because it is specifically for turning iterables into iterators, but enumerate() isn't special in the same way.

There's also the issue @TeamSpen210 brought up on the previous thread: with the union, some types (e.g., dict[int, T]) can match both union members, and it becomes ambiguous which one will be picked by type checkers. This could in theory lead to wrong types being inferred.

srittau · 2024-07-11T09:51:58Z

As others have pointed out, the old-style iteration protocol isn't currently supported by typeshed, except in the case of iter(). Introducing it would have a far-reaching ripple effect for what is effectively a legacy feature that's easily fixed or worked around. While there was a good reason for adding support to iter() (to provide an easy way to adapt from the old-style protocol to the "new"-style protocol), allowing it in other places would be a step back.

jorenham · 2024-07-11T13:21:51Z

I still don't understand what exactly makes this a "legacy feature", or "old-style".
As far as I know, it isn't deprecated?

jorenham · 2024-07-11T13:30:16Z

And @JelleZijlstra that union issue you mention doesn't apply here, as the union is used to annotate a function parameter, and not passed as type argument to a covariant type parameter.

srittau · 2024-07-11T13:41:19Z

I still don't understand what exactly makes this a "legacy feature", or "old-style".

Because a better alternative has been introduced 23 years ago and __getitem__() iteration has multiple issues and doesn't gel well with typing. Please accept that the typeshed maintainers have no interest in implementing this feature.

JelleZijlstra · 2024-07-11T14:09:49Z

that union issue you mention doesn't apply here, as the union is used to annotate a function parameter, and not passed as type argument to a covariant type parameter

Why do you think that matters? I haven't tried it but I believe your PR would make type checkers incorrectly accept x: dict[int, str]; y: Iterable[tuple[int, str]] = enumerate(x).

jorenham · 2024-07-11T14:25:30Z

@JelleZijlstra Yes you've got a point here.

But it's easy to fix this using overloads, so that we can specifiy that Iterable[T] is to be matched first.

Here's a demonstration:
https://pyright-play.net/?strict=true&code=GYJw9gtgBAxmA28CmMAuBLMA7AzgOgEMAjGKdCABzBFSgElUkRjkAoUSKVATwvSwDmZStVoAFcKjBx4AGigBlJPGDyAKryQA1AiHlgAbk3hgCAE1as1UALxQNFbboAUAcjWuAlFYD6cW-aaOiBuan5grvJwBrroBFioNmogAK5I3qww8AQ4OFA%2BAOJIqAxIEKXMRMjOEmBSMgDaYXAAup4AXKxQ3VBmSMD5PgLF6IwQPj7OOMqqZO1kCfIA9J5QALQAfPbh83h7llk5eUhYKRBMBIwATM4nZxeMTW2dPVAAAobGphavfQMTWCQAHcJs4sjh5KMLlUkPMKiwkE95DhULpUPN%2BLQ7AAGVabRQzXb7V4fIwgEzmLo9P6DQEgybgyGMSrIeaFYqlcrMhFIqAotEYhIBXHrLZKFREvBU7o0gHA0GMsjcmFw5XIJ5QAA%2B%2BSKJTG8JhvP5NEFWKgIvx4uAL1ePRAxRSICwUDu52YjGcUJZSGRqJoNmNqFWUAAxFxNBiBFhqEhLAAPeZmdBoBqY30gFoBADe2PmrlcAF9WNxVdD1agUhR1WmFqgWpm7K6HkhnHHvNwrqXvQ0K1XETXMfWAk33Ugbm2gA

Mypy accepts this as well.

jorenham · 2024-07-11T14:37:23Z

So now that there aren't any technical obstacles left, would you care to reconsider?

JelleZijlstra · 2024-07-11T15:25:24Z

I'm not interested in making this change; I don't think the increase in complexity is worth it. Another maintainer can speak up and merge a version of this change if they have a different view; I won't mind.

jorenham · 2024-07-11T16:32:52Z

Ok that's fine. Could you re-open it then? Or should I make a new PR?

AlexWaygood · 2024-07-11T16:40:21Z

Could you re-open it then?

it doesn't seem like you've persuaded any maintainers yet that this change is worth it. If any of us is persuaded at any point, rest assured we will not hesitate to reopen this PR.

jorenham · 2024-07-11T17:25:56Z

I'm rather surprised that this bugfix, which at this point has no technical objections, requires persuasion at all. Additionally, it was closed for reasons that don't apply anymore, yet re-opening it is out of the question because "I haven't persuaded any maintainers yet".

Shouldn't it be the other way around? A bug should only remain in the code if there's a good reason for it.
And I've heard good reasons, but at this point they don't apply anymore.

The only remaining objection is that of the "confusing error messages". But that's the responsibility of type-checkers. The responsibility of typing stubs is to provide type hints that are correct and complete.
But I'm sure that there are cases where blurred these lines in unavoidable, so realistically the term "responsibility" should be replaced with something like "priority".
And even so, my point still stands.

And yes, this is in fact a fix for a bug, not just a "change": Currently type-checkers falsely reject valid usage of enumerate.

That should be more than enough reason to, at the very least, re-open this PR.
Otherwise, exactly which points remain that require more explanation?
Or perhaps starting a fresh PR is the way to go here?

AlexWaygood · 2024-07-11T17:42:59Z

Or perhaps starting a fresh PR is the way to go here?

I think you'll find that to be counterproductive when it comes to convincing us

Akuli · 2024-07-19T15:40:24Z

Do you have a practical example of why this change would be useful to you? What is the type that you want to enumerate over and only has __getitem__?

I agree with other maintainers that this is a slippery slope: if we change enumerate() to accept __getitem__-only iterables, then over time, more and more things get extended to accept them and Iterable becomes useless. We must say no at some point.

jorenham · 2024-07-19T18:29:04Z

Do you have a practical example of why this change would be useful to you? What is the type that you want to enumerate over and only has __getitem__?

The first example that comes to mind is ctypes.pointer.

if we change enumerate() to accept __getitem__-only iterables, then over time, more and more things get extended to accept them and Iterable becomes useless.

I don't see this would make Iterable useless:
Just like there are types that implement __getitem__ (for int keys) but no __iter__, there are also types that implement __iter__ but not __getitem__.
In case of both, the __iter__ signature should be considered first (e.g. using overloads), which is not unlike how __add__ precedes __radd__, so it shouldn't be difficult to understand for developers.

Which brings me to another advantage of not having to define an __iter__ if not needed:
If runtime-protocols for sequence-like types don't require an __iter__ anymore, then their (initial) isinstance checks will be faster.

Akuli · 2024-07-22T18:31:16Z

The first example that comes to mind is ctypes.pointer.

I think ctypes is a good example. With enumerate(), you can presumably do some things that would be done with pointer arithmetic in C. I'm not very familiar with ctypes, but I can see how it would be useful.

How do you use enumerate() here though? A pointer does not know where the memory region ends, so do you just consume the first few elements of the enumerate() iterator and then throw it away? Does this mean that allowing __getitem__ iterables would be more useful for enumerate() than list()?

If runtime-protocols for sequence-like types don't require an __iter__ anymore, then their (initial) isinstance() checks will be faster.

This doesn't work. A dict[int, str] shouldn't be a valid Sequence[str] only because you canindex it with integers. (This doesn't create a bug in this PR as long as the __iter__ overload is first, as it should be anyway.) I also tend to be very sceptical when someone claims something affects performance, because in my experience, bottlenecks are almost always in surprising places that can only be found by measuring.