BUG: .max() raises exception on Series with object dtype and mixture of Timestamp and NaT: TypeError: '>=' not supported between instances of 'Timestamp' and 'float' #58707
Labels
Bug
Dtype Conversions
Unexpected or buggy dtype conversions
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Timezones
Timezone data dtype
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The above code raises the exception:
This code is a simplified version of some prod code that caused issues at my work.
I believe what's happening is df1 has dtype
datetime[ns, America/New_York]
for the column and df2 has dtypedatetime[ns]
, and when you concat them, the resulting dtype isobject
. Then,.max()
coercespd.NaT
to NaN, and you get a comparison between a Timestamp and a float.Expected Behavior
I expect the code to return
pd.Timestamp('2024-05-13 12:00:00', tz='America/New_York')
.I think Pandas should be robust and handle this case, even though the dtypes aren't perfectly "correct". I think the right place to fix this is in the
.max()
function: the code(where
max
isbuiltins.max
) works fine, so you would also expect the Pandas equivalent to work.You could maybe also make an argument that
pd.concat
should special-case this and return a column with dtypedatetime64[ns, America/New_York]
, but I'm less sure about that.Longer-term, I feel like Pandas should support datetime columns with heterogenous timezones; the requirement that timezones be the same for a whole column feels like an artificial constraint and many real-world datasets will naturally have heterogenous timezones.
Installed Versions
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.1.1
pip : 24.0
Cython : None
pytest : 8.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: