Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Concat for two dataframes row-wise fails if one of columns is datetime and iterrows was used #56779

Open
2 of 3 tasks
ilyakochik opened this issue Jan 8, 2024 · 7 comments
Labels
Bug datetime.date stdlib datetime.date support Dtype Conversions Unexpected or buggy dtype conversions

Comments

@ilyakochik
Copy link

ilyakochik commented Jan 8, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import datetime

data = {'Column1': [1, 2, 3, 33],
        'Column2': [4, 5, 6, 66],
        'Column3': [7, 8, 9, 99]}
df = pd.DataFrame(data)


# once the below is uncommented, the pd.concat would fail...
# df["Column2"] = datetime.datetime(2018, 1, 1)

# ... if this is uncommented, it would work though
# df["Column2"] = datetime.date(2018, 1, 1)

random_rows = pd.DataFrame([row for _, row in df.sample(n=3).iterrows()])
df = pd.concat([df, random_rows])
print(df)

Issue Description

The pd.concat of two dataframes fails when two conditions are met:

  • one of the columns is datetime (if it is date everything works though)
  • the second dataframe in concat was constructed using .iterrows()

Changing iterrows() to itertuples() (in case something broke down in types) doesn't solve the problem

Expected Behavior

It works with the date object, so would expect it to be a bug.

Installed Versions

INSTALLED VERSIONS

commit : a671b5a
python : 3.11.7.final.0
python-bits : 64
OS : Darwin
OS-release : 23.1.0
Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:12 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.4
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
bs4 : None
bottleneck : 1.3.5
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.8.7
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@ilyakochik ilyakochik added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 8, 2024
@ilyakochik ilyakochik changed the title BUG: Concat for two data frames row-wise fails if one of columns is datetime and iterrows was used BUG: Concat for two dataframes row-wise fails if one of columns is datetime and iterrows was used Jan 8, 2024
@phofl
Copy link
Member

phofl commented Jan 8, 2024

Can you post the traceback?

@rhshadrach
Copy link
Member

I am not able to replicate on main.

@ilyakochik
Copy link
Author

Can you post the traceback?

cd /Users/ilko/Code/clay/backend ; /usr/bin/env /Users/ilko/Code/clay/backend/.conda/bin/python /Users/ilko/.vscode
/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 63255 -- main.py --init 
Traceback (most recent call last):
  File "/Users/ilko/Code/clay/backend/main.py", line 17, in <module>
    df = pd.concat([df, random_rows])
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 680, in get_result
    new_data = concatenate_managers(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 189, in concatenate_managers
    values = _concatenate_join_units(join_units, copy=copy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 486, in _concatenate_join_units
    concat_values = concat_compat(to_concat, axis=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/dtypes/concat.py", line 132, in concat_compat
    return cls._concat_same_type(to_concat_eas)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py", line 2251, in _concat_same_type
    new_obj = super()._concat_same_type(to_concat, axis)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilko/Code/clay/backend/.conda/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py", line 230, in _concat_same_type
    return super()._concat_same_type(to_concat, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "arrays.pyx", line 190, in pandas._libs.arrays.NDArrayBacked._concat_same_type
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4 and the array at index 1 has size 3

@cameronbronstein
Copy link

cameronbronstein commented Feb 16, 2024

I ran into a similar issue today when concatenating two dataframes that had a mix of datetime[ns] and datetime[us] types in a datetime column.

@rhshadrach
Copy link
Member

That appears to be the issue here @cameronbronstein - while concat on 2.2.x and main now succeed, the DataFrame originally has Colum2 as datetime64[us] and upon resample becomes datetime64[ns]. I think this is unexpected. Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach added Dtype Conversions Unexpected or buggy dtype conversions datetime.date stdlib datetime.date support and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 2, 2024
@emiford
Copy link

emiford commented Apr 4, 2024

take

@emiford
Copy link

emiford commented Apr 26, 2024

Looks like the unit of the datetime type gets set to a default ns during the construction of the DataFrame from the sampled rows. During construction, columns containing datetime type objects go through core/dtypes/cast.py/maybe_infer_datetimelike which calls lib.maybe_convert_objects that constructs a DatetimeIndex from the ndarray of Column2 variables. When the array goes to core/arrays/datetimes.py/_sequence_to_dt64 in the DatetimeIndex construction, data_dtype = np.dtype('O') and out_unit = None and out_unit gets set to ns.

I've played around with it a bit, when every column is a datetime type, there is no issue, but when only one or a few are datetime types, it converts them to datetime[ns]. It also occurs with use of both iterrows and itertuples. I was able to fix the conversion in the issue example by fetching unit from the datetime type object in _sequence_to_dt64, but this runs into issues in instances when the conversion to datetime[ns] is the expected behavior. I'm not sure what the right fix here would be.

Here is the check I implemented:
if (data_dtype is np.dtype('O') and isinstance(data, np.ndarray)):
if (len(data) > 0 and isinstance(data[0], Timestamp) and data[0].tz is None):
# catch ndarrays that should have a datetime dtype but don't
out_unit = data[0].unit

@emiford emiford removed their assignment Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

5 participants