Skip to content

exponentially slow typechecking when overloaded numpy functions used in generic containers like list/dict #14718

Open
@Hnasar

Description

@Hnasar

Bug Report

If you have a list or dict with numpy functions, it takes too-long to typecheck.

We have a lot of code that uses numpy and I found that one of our 300-line modules was taking 2 hours to typecheck. Using --line-checking-stats I noticed that a single line had stats in the billions (1000x more than other lines). It was something like this:

    def get_real_df(self) -> DataFrame:
        groupby_agg = {
            "size": np.sum,
            "vol": np.sum,
            "price": np.sum,
            "last_size": np.sum,
            "last_vol": np.max,
            "is_active": np.all,
            ...
        }
        ... # <snip>
        return df.groupby("id").agg(groupby_agg)

To Reproduce

from numpy import sum
bad = [sum, sum, sum]  # try adding even more 'sum, ...' 
reveal_type(bad)

Expected Behavior

This to type check quickly.

Actual Behavior

This takes 4 seconds.

bad.py:8: note: Revealed type is "builtins.list[Overload(def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def [_SCT <: numpy.generic] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[_SCT`-1]]]], axis: None =, dtype: None =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _SCT`-1, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]] =, out: None =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> Any, def [_ArrayType <: numpy.ndarray[Any, numpy.dtype[Any]]] (a: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes, numpy._typing._nested_sequence._NestedSequence[Union[builtins.bool, builtins.int, builtins.float, builtins.complex, builtins.str, builtins.bytes]]], axis: Union[None, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]] =, dtype: Union[numpy.dtype[Any], None, Type[Any], numpy._typing._dtype_like._SupportsDType[numpy.dtype[Any]], builtins.str, Union[Tuple[Any, builtins.int], Tuple[Any, Union[typing.SupportsIndex, typing.Sequence[typing.SupportsIndex]]], builtins.list[Any], TypedDict('numpy._typing._dtype_like._DTypeDict', {'names': typing.Sequence[builtins.str], 'formats': typing.Sequence[Any], 'offsets'?: typing.Sequence[builtins.int], 'titles'?: typing.Sequence[Any], 'itemsize'?: builtins.int, 'aligned'?: builtins.bool}), Tuple[Any, Any]]] =, out: _ArrayType`-1 =, keepdims: builtins.bool =, initial: Union[builtins.int, builtins.float, builtins.complex, numpy.number[Any], numpy.bool_] =, where: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[numpy.bool_]]], builtins.bool, numpy._typing._nested_sequence._NestedSequence[builtins.bool]] =) -> _ArrayType`-1)]"
Success: no issues found in 1 source file

real    0m4.010s

Workaround

If you add an explicit type then it typechecks quickly:

import numpy as np
from typing import Any, Callable

okay: list[Callable[..., Any]] = [np.sum, np.sum, np.sum]

Your Environment

  • Mypy version used: 1.0.0 (compiled: no -- because I wanted to profile with py-spy; pip install --force mypy --no-binary :all)
  • Mypy command-line flags: none
  • Mypy configuration options from mypy.ini (and other config files): none
  • Python version used: 3.10
  • Numpy version: 1.24.2

Note: I also tried mypy 0.982 and it was about 10x slower. The performance improvements (e.g. #13821) definitely helped!

Profile

I graphed the relation between the size of the list and the time to typecheck:
image

and I ran it through py-spy:

py-spy record -o profile.svg -- python3.10 -m mypy bad.py

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions