Skip to content

Commit 5ab232c

Browse files
authored
Merge branch 'main' into bug#60583
2 parents 66aeb81 + 8943c97 commit 5ab232c

File tree

18 files changed

+452
-73
lines changed

18 files changed

+452
-73
lines changed

doc/source/whatsnew/v2.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Other enhancements
3737
updated to work correctly with NumPy >= 2 (:issue:`57739`)
3838
- :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`)
3939
- :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`)
40+
- Improved ``repr`` of :class:`.NumpyExtensionArray` to account for NEP51 (:issue:`61085`)
4041
- The :meth:`Series.str.decode` has gained the argument ``dtype`` to control the dtype of the result (:issue:`60940`)
4142
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns (:issue:`60633`)
4243
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)

doc/source/whatsnew/v3.0.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,13 @@ Other enhancements
6161
- :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`)
6262
- :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`)
6363
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`)
64+
- :class:`ArrowDtype` now supports ``pyarrow.JsonType`` (:issue:`60958`)
6465
- :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` methods ``sum``, ``mean``, ``median``, ``prod``, ``min``, ``max``, ``std``, ``var`` and ``sem`` now accept ``skipna`` parameter (:issue:`15675`)
6566
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
6667
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
6768
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
6869
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
70+
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
6971
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
7072
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
7173
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
@@ -783,6 +785,7 @@ Reshaping
783785
^^^^^^^^^
784786
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
785787
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
788+
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
786789
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
787790
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
788791
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)

pandas/_libs/lib.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -1518,7 +1518,7 @@ cdef object _try_infer_map(object dtype):
15181518

15191519
def infer_dtype(value: object, skipna: bool = True) -> str:
15201520
"""
1521-
Return a string label of the type of a scalar or list-like of values.
1521+
Return a string label of the type of the elements in a list-like input.
15221522

15231523
This method inspects the elements of the provided input and determines
15241524
classification of its data type. It is particularly useful for
@@ -1527,7 +1527,7 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
15271527

15281528
Parameters
15291529
----------
1530-
value : scalar, list, ndarray, or pandas type
1530+
value : list, ndarray, or pandas type
15311531
The input data to infer the dtype.
15321532
skipna : bool, default True
15331533
Ignore NaN values when inferring the type.

pandas/api/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""public toolkit API"""
22

33
from pandas.api import (
4+
executors,
45
extensions,
56
indexers,
67
interchange,
@@ -9,6 +10,7 @@
910
)
1011

1112
__all__ = [
13+
"executors",
1214
"extensions",
1315
"indexers",
1416
"interchange",

pandas/api/executors/__init__.py

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""
2+
Public API for function executor engines to be used with ``map`` and ``apply``.
3+
"""
4+
5+
from pandas.core.apply import BaseExecutionEngine
6+
7+
__all__ = ["BaseExecutionEngine"]

pandas/compat/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
pa_version_under17p0,
3636
pa_version_under18p0,
3737
pa_version_under19p0,
38+
pa_version_under20p0,
3839
)
3940

4041
if TYPE_CHECKING:
@@ -168,4 +169,5 @@ def is_ci_environment() -> bool:
168169
"pa_version_under17p0",
169170
"pa_version_under18p0",
170171
"pa_version_under19p0",
172+
"pa_version_under20p0",
171173
]

pandas/core/apply.py

+104
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,110 @@
7474
ResType = dict[int, Any]
7575

7676

77+
class BaseExecutionEngine(abc.ABC):
78+
"""
79+
Base class for execution engines for map and apply methods.
80+
81+
An execution engine receives all the parameters of a call to
82+
``apply`` or ``map``, such as the data container, the function,
83+
etc. and takes care of running the execution.
84+
85+
Supporting different engines allows functions to be JIT compiled,
86+
run in parallel, and others. Besides the default executor which
87+
simply runs the code with the Python interpreter and pandas.
88+
"""
89+
90+
@staticmethod
91+
@abc.abstractmethod
92+
def map(
93+
data: Series | DataFrame | np.ndarray,
94+
func: AggFuncType,
95+
args: tuple,
96+
kwargs: dict[str, Any],
97+
decorator: Callable | None,
98+
skip_na: bool,
99+
):
100+
"""
101+
Executor method to run functions elementwise.
102+
103+
In general, pandas uses ``map`` for running functions elementwise,
104+
but ``Series.apply`` with the default ``by_row='compat'`` will also
105+
call this executor function.
106+
107+
Parameters
108+
----------
109+
data : Series, DataFrame or NumPy ndarray
110+
The object to use for the data. Some methods implement a ``raw``
111+
parameter which will convert the original pandas object to a
112+
NumPy array, which will then be passed here to the executor.
113+
func : function or NumPy ufunc
114+
The function to execute.
115+
args : tuple
116+
Positional arguments to be passed to ``func``.
117+
kwargs : dict
118+
Keyword arguments to be passed to ``func``.
119+
decorator : function, optional
120+
For JIT compilers and other engines that need to decorate the
121+
function ``func``, this is the decorator to use. While the
122+
executor may already know which is the decorator to use, this
123+
is useful as for a single executor the user can specify for
124+
example ``numba.jit`` or ``numba.njit(nogil=True)``, and this
125+
decorator parameter will contain the exact decorator from the
126+
executor the user wants to use.
127+
skip_na : bool
128+
Whether the function should be called for missing values or not.
129+
This is specified by the pandas user as ``map(na_action=None)``
130+
or ``map(na_action='ignore')``.
131+
"""
132+
133+
@staticmethod
134+
@abc.abstractmethod
135+
def apply(
136+
data: Series | DataFrame | np.ndarray,
137+
func: AggFuncType,
138+
args: tuple,
139+
kwargs: dict[str, Any],
140+
decorator: Callable,
141+
axis: Axis,
142+
):
143+
"""
144+
Executor method to run functions by an axis.
145+
146+
While we can see ``map`` as executing the function for each cell
147+
in a ``DataFrame`` (or ``Series``), ``apply`` will execute the
148+
function for each column (or row).
149+
150+
Parameters
151+
----------
152+
data : Series, DataFrame or NumPy ndarray
153+
The object to use for the data. Some methods implement a ``raw``
154+
parameter which will convert the original pandas object to a
155+
NumPy array, which will then be passed here to the executor.
156+
func : function or NumPy ufunc
157+
The function to execute.
158+
args : tuple
159+
Positional arguments to be passed to ``func``.
160+
kwargs : dict
161+
Keyword arguments to be passed to ``func``.
162+
decorator : function, optional
163+
For JIT compilers and other engines that need to decorate the
164+
function ``func``, this is the decorator to use. While the
165+
executor may already know which is the decorator to use, this
166+
is useful as for a single executor the user can specify for
167+
example ``numba.jit`` or ``numba.njit(nogil=True)``, and this
168+
decorator parameter will contain the exact decorator from the
169+
executor the user wants to use.
170+
axis : {0 or 'index', 1 or 'columns'}
171+
0 or 'index' should execute the function passing each column as
172+
parameter. 1 or 'columns' should execute the function passing
173+
each row as parameter. The default executor engine passes rows
174+
as pandas ``Series``. Other executor engines should probably
175+
expect functions to be implemented this way for compatibility.
176+
But passing rows as other data structures is technically possible
177+
as far as the function ``func`` is implemented accordingly.
178+
"""
179+
180+
77181
def frame_apply(
78182
obj: DataFrame,
79183
func: AggFuncType,

pandas/core/arrays/arrow/array.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -1938,7 +1938,10 @@ def _explode(self):
19381938
"""
19391939
# child class explode method supports only list types; return
19401940
# default implementation for non list types.
1941-
if not pa.types.is_list(self.dtype.pyarrow_dtype):
1941+
if not (
1942+
pa.types.is_list(self.dtype.pyarrow_dtype)
1943+
or pa.types.is_large_list(self.dtype.pyarrow_dtype)
1944+
):
19421945
return super()._explode()
19431946
values = self
19441947
counts = pa.compute.list_value_length(values._pa_array)

pandas/core/arrays/numpy_.py

+12
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from typing import (
44
TYPE_CHECKING,
5+
Any,
56
Literal,
67
)
78

@@ -29,6 +30,8 @@
2930
from pandas.core.strings.object_array import ObjectStringArrayMixin
3031

3132
if TYPE_CHECKING:
33+
from collections.abc import Callable
34+
3235
from pandas._typing import (
3336
AxisInt,
3437
Dtype,
@@ -565,3 +568,12 @@ def _wrap_ndarray_result(self, result: np.ndarray):
565568

566569
return TimedeltaArray._simple_new(result, dtype=result.dtype)
567570
return type(self)(result)
571+
572+
def _formatter(self, boxed: bool = False) -> Callable[[Any], str | None]:
573+
# NEP 51: https://github.com/numpy/numpy/pull/22449
574+
if self.dtype.kind in "SU":
575+
return "'{}'".format
576+
elif self.dtype == "object":
577+
return repr
578+
else:
579+
return str

pandas/core/dtypes/dtypes.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2265,7 +2265,7 @@ def type(self):
22652265
elif pa.types.is_null(pa_type):
22662266
# TODO: None? pd.NA? pa.null?
22672267
return type(pa_type)
2268-
elif isinstance(pa_type, pa.ExtensionType):
2268+
elif isinstance(pa_type, pa.BaseExtensionType):
22692269
return type(self)(pa_type.storage_type).type
22702270
raise NotImplementedError(pa_type)
22712271

0 commit comments

Comments
 (0)