Skip to content

[FEA] Ability to round-trip all pandas columns dtypes #14149

@galipremsagar

Description

@galipremsagar

Is your feature request related to a problem? Please describe.
With the current Column design and to_pandas API implementation it is only possible to convert a cudf series to numpy dtype or pandas nullable dtypes. However, pandas also support arrow-backed dtypes.

In [1]: import pandas as pd

In [2]: np_series = pd.Series([1, 2, 3], dtype='int64')

In [3]: pd_series = pd.Series([1, 2, 3], dtype=pd.Int64Dtype())

In [4]: import pyarrow as pa

In [5]: arrow_series = pd.Series([1, 2, 3], dtype=pd.ArrowDtype(pa.int64()))

In [6]: np_series
Out[6]: 
0    1
1    2
2    3
dtype: int64

In [7]: pd_series
Out[7]: 
0    1
1    2
2    3
dtype: Int64

In [8]: arrow_series
Out[8]: 
0   1
1   2
2   3
dtype: int64[pyarrow]

In [9]: import cudf

In [10]: cudf.from_pandas(np_series).to_pandas()
Out[10]: 
0    1
1    2
2    3
dtype: int64

In [11]: cudf.from_pandas(pd_series).to_pandas()
Out[11]: 
0    1
1    2
2    3
dtype: int64

In [12]: cudf.from_pandas(arrow_series).to_pandas()
Out[12]: 
0    1
1    2
2    3
dtype: int64

Describe the solution you'd like
I would like cudf to have the ability to round-trip the data type of pandas successfully.

Metadata

Metadata

Assignees

Labels

PythonAffects Python cuDF API.cudf.pandasIssues specific to cudf.pandasfeature requestNew feature or request

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions