Skip to content

Latest commit

 

History

History
863 lines (728 loc) · 68.2 KB

v3.0.0.rst

File metadata and controls

863 lines (728 loc) · 68.2 KB

What's new in 3.0.0 (Month XX, 2024)

These are the changes in pandas 3.0.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Enhancement1

Enhancement2

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Improved behavior in groupby for observed=False

A number of bugs have been fixed due to improved handling of unobserved groups (:issue:`55738`). All remarks in this section equally impact :class:`.SeriesGroupBy`.

In previous versions of pandas, a single grouping with :meth:`.DataFrameGroupBy.apply` or :meth:`.DataFrameGroupBy.agg` would pass the unobserved groups to the provided function, resulting in 0 below.

.. ipython:: python

    df = pd.DataFrame(
        {
            "key1": pd.Categorical(list("aabb"), categories=list("abc")),
            "key2": [1, 1, 1, 2],
            "values": [1, 2, 3, 4],
        }
    )
    df
    gb = df.groupby("key1", observed=False)
    gb[["values"]].apply(lambda x: x.sum())

However this was not the case when using multiple groupings, resulting in NaN below.

In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
           values
key1 key2
a    1        3.0
     2        NaN
b    1        3.0
     2        4.0
c    1        NaN
     2        NaN

Now using multiple groupings will also pass the unobserved groups to the provided function.

.. ipython:: python

    gb = df.groupby(["key1", "key2"], observed=False)
    gb[["values"]].apply(lambda x: x.sum())

Similarly:

These improvements also fixed certain bugs in groupby:

notable_bug_fix2

Backwards incompatible API changes

Datetime resolution inference

Converting a sequence of strings, datetime objects, or np.datetime64 objects to a datetime64 dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects :class:`Series`, :class:`DataFrame`, :class:`Index`, :class:`DatetimeIndex`, and :func:`to_datetime`.

Previously, these would always give nanosecond resolution:

In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [2]: pd.to_datetime([dt]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.Index([dt]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.DatetimeIndex([dt]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.Series([dt]).dtype
Out[5]: dtype('<M8[ns]')

This now infers the unit microsecond unit "us" from the pydatetime object, matching the scalar :class:`Timestamp` behavior.

.. ipython:: python

    In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
    In [2]: pd.to_datetime([dt]).dtype
    In [3]: pd.Index([dt]).dtype
    In [4]: pd.DatetimeIndex([dt]).dtype
    In [5]: pd.Series([dt]).dtype

Similar when passed a sequence of np.datetime64 objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).

When passing strings, the resolution will depend on the precision of the string, again matching the :class:`Timestamp` behavior. Previously:

In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[5]: dtype('<M8[ns]')

The inferred resolution now matches that of the input strings:

.. ipython:: python

    In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
    In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
    In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
    In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype

In cases with mixed-resolution inputs, the highest resolution is used:

In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
Out[2]: dtype('<M8[ns]')

In previous versions of pandas, :meth:`DataFrame.value_counts` with sort=False would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with :meth:`Series.value_counts` which would maintain the order of the input. Now :meth:`DataFrame.value_counts` will maintain the order of the input.

.. ipython:: python

    df = pd.DataFrame(
        {
            "a": [2, 2, 2, 2, 1, 1, 1, 1],
            "b": [2, 1, 3, 1, 2, 3, 1, 1],
        }
    )
    df

Old behavior

In [3]: df.value_counts(sort=False)
Out[3]:
a  b
1  1    2
   2    1
   3    1
2  1    2
   2    1
   3    1
Name: count, dtype: int64

New behavior

.. ipython:: python

    df.value_counts(sort=False)

This change also applies to :meth:`.DataFrameGroupBy.value_counts`. Here, there are two options for sorting: one sort passed to :meth:`DataFrame.groupby` and one passed directly to :meth:`.DataFrameGroupBy.value_counts`. The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input within groups.

Old behavior

In [5]: df.groupby("a", sort=True).value_counts(sort=False)
Out[5]:
a  b
1  1    2
   2    1
   3    1
2  1    2
   2    1
   3    1
dtype: int64

New behavior

.. ipython:: python

    df.groupby("a", sort=True).value_counts(sort=False)

Increased minimum version for Python

pandas 3.0.0 supports Python 3.10 and higher.

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.23.5 X X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package New Minimum Version
pytz 2023.4
fastparquet 2023.10.0
adbc-driver-postgresql 0.10.0
mypy (dev) 1.9.0

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

pytz now an optional dependency

pandas now uses :py:mod:`zoneinfo` from the standard library as the default timezone implementation when passing a timezone string to various methods. (:issue:`34916`)

Old behavior:

In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [2]: ts.tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

New behavior:

.. ipython:: python

    ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
    ts.tz

pytz timezone objects are still supported when passed directly, but they will no longer be returned by default from string inputs. Moreover, pytz is no longer a required dependency of pandas, but can be installed with the pip extra pip install pandas[timezone].

Additionally, pandas no longer throws pytz exceptions for timezone operations leading to ambiguous or nonexistent times. These cases will now raise a ValueError.

Other API changes

Deprecations

Copy keyword

The copy keyword argument in the following methods is deprecated and will be removed in a future version:

Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until necessary. Use .copy to trigger an eager copy. The copy keyword has no effect starting with 3.0, so it can be safely removed from your code.

Other Deprecations

Removal of prior version deprecations/changes

Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets

Renamed the following offset aliases (:issue:`57986`):

offset removed alias new alias
:class:`MonthEnd` M ME
:class:`BusinessMonthEnd` BM BME
:class:`SemiMonthEnd` SM SME
:class:`CustomBusinessMonthEnd` CBM CBME
:class:`QuarterEnd` Q QE
:class:`BQuarterEnd` BQ BQE
:class:`YearEnd` Y YE
:class:`BYearEnd` BY BYE

Other Removals

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Styler

  • Bug in :meth:`Styler.to_latex` where styling column headers when combined with a hidden index or hidden index-levels is fixed.

Other

Contributors