Open
Description
When loading from pandas in the table with dates, the UTC timezone is added to the dtype.
This is confusing.
Is this correct or a bug?
Package Version
crate 2.0.0
pandas 2.2.3
SQLAlchemy 2.0.39
sqlalchemy-cratedb 0.42.0.dev0
test
import sqlalchemy as sa
import pandas as pd
data = {
"date_1": ["2020-01-01", "2021-01-01", "2022-01-01", "2023-01-01", "2027-12-30"],
"date_2": ["2020-09-24", "2020-10-24", "2020-11-24", "2020-12-24", "2027-09-24"],
}
df_data = pd.DataFrame.from_dict(data, dtype="datetime64[ns]")
print(df_data.dtypes)
print(df_data.sort_values(by="date_1").reset_index(drop=True))
dburi = "crate://panduser:[email protected]:4200?ssl=false"
engine = sa.create_engine(dburi, echo=False)
conn = engine.connect()
df_data.to_sql(
"test_date",
conn,
if_exists="replace",
index=False,
)
conn.exec_driver_sql("REFRESH TABLE test_date;")
df_load = pd.read_sql_table("test_date", conn)
print("\ndataframe after loading")
df_load = df_load.sort_values(by="date_1").reset_index(drop=True)
print(df_load.dtypes)
print(df_load)
Output:
date_1 datetime64[ns]
date_2 datetime64[ns]
dtype: object
date_1 date_2
0 2020-01-01 2020-09-24
1 2021-01-01 2020-10-24
2 2022-01-01 2020-11-24
3 2023-01-01 2020-12-24
4 2027-12-30 2027-09-24
dataframe after loading
date_1 datetime64[ns, UTC]
date_2 datetime64[ns, UTC]
dtype: object
date_1 date_2
0 2020-01-01 00:00:00+00:00 2020-09-24 00:00:00+00:00
1 2021-01-01 00:00:00+00:00 2020-10-24 00:00:00+00:00
2 2022-01-01 00:00:00+00:00 2020-11-24 00:00:00+00:00
3 2023-01-01 00:00:00+00:00 2020-12-24 00:00:00+00:00
4 2027-12-30 00:00:00+00:00 2027-09-24 00:00:00+00:00
After loading, to remove the time zone, I do this
df2 = df_load.select_dtypes("datetimetz")
df_load[df2.columns] = df2.apply(lambda x: x.dt.tz_convert(None))