Open
Description
Describe the bug
Using DataFrame.apply
to map rows to a custom type seems to be a valid/supported pattern in pandas (i.e., works at runtime). It looks like the overloads of apply
currently do not support this pattern at type checking time.
To Reproduce
- Provide a minimal runnable
pandas
example that is not properly checked by the stubs.
The following examples "maps" a data frame row-wise to a custom type SomeType
. Ideally, it would be great if the type checker could infer that list_of_instances
is of type list[SomeType]
(which it is at runtime).
from dataclasses import dataclass
import pandas as pd
@dataclass
class SomeType:
a: int
b: int
df = pd.DataFrame(
{
"a": [1, 2, 3],
"b": [2, 3, 4],
}
)
list_of_instances = list(df.apply(lambda row: SomeType(a=row["a"], b=row["b"]), axis=1))
for x in list_of_instances:
assert isinstance(x, SomeType)
print(x)
- Indicate which type checker you are using (
mypy
orpyright
).
The behavior seems to be the same with mypy and pyright.
- Show the error message received from that type checker while checking your example.
mypy
:
check_dataframe_apply.py:16: error: No overload variant of "apply" of "DataFrame" matches argument types "Callable[[Any], SomeType]", "int" [call-overload]
check_dataframe_apply.py:16: note: Possible overload variants:
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any]], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note: def [S1] apply(self, f: Callable[..., S1 | NAType], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., Mapping[Any, Any]], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note: def [S1] apply(self, f: Callable[..., S1 | NAType], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['expand', 'reduce'], **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any] | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['expand'], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['reduce'], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any] | str | bytes | date | datetime | timedelta | <7 more items> | complex | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['broadcast'], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., Series[Any]], axis: Literal['index', 0] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['reduce'], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note: def [S1] apply(self, f: Callable[..., S1 | NAType], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Mapping[Any, Any]], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., Series[Any]], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note: def apply(self, f: Callable[..., Series[Any]], raw: bool = ..., args: Any = ..., *, axis: Literal['columns', 1], result_type: Literal['reduce'], **kwargs: Any) -> DataFrame
pyright
:
No overloads for "apply" match the provided argumentsPylancereportGeneralTypeIssues
frame.pyi(1344, 9): Overload 11 is the closest match
Argument of type "(row: Any) -> SomeType" cannot be assigned to parameter "f" of type "(...) -> Series[Any]" in function "apply"
Type "(row: Any) -> SomeType" cannot be assigned to type "(...) -> Series[Any]"
Function return type "SomeType" is incompatible with type "Series[Any]"
"SomeType" is incompatible with "Series[Any]"PylancereportGeneralTypeIssues
Please complete the following information:
- OS: [e.g. Windows, Linux, MacOS]: Linux
- OS Version [e.g. 22]: Ubuntu 20.04
- python version: 3.10.13
- version of type checker: mypy 1.11.2
- version of installed
pandas-stubs
: 2.2.2.240909 (latest as of writing)
Metadata
Metadata
Assignees
Labels
No labels