Description
Describe the bug
When using magic command to zero code accelerate Pandas, CSV are read differently.
In Pandas, missing values are read as NaN and in cuDF-Pandas, missing values are read as <NA>
. This makes a big difference afterward if users are combining columns with df[COL1].astype('str')+'_'+df[COL2].astype('str')
. In Pandas this creates a rich result that combines NaN and number. For example when COL1=NaN
and COL2=3
then the string combination is nan_3.0
but in cuDF-Pandas the result is always <NA>
and much information is lost in the combination.
Steps/Code to reproduce bug
csv_content = """a,b
1,4
2,5
3,6
,7
"""
with open("example.csv", "w") as f:
f.write(csv_content)
df = pd.read_csv("example.csv")
Expected behavior
It would be nice if both cuDF-Pandas matched the behavior of Pandas and read missing values as NaN
Environment overview (please complete the following information)
RAPIDS '25.02.02'
Environment details
Please run and paste the output of the cudf/print_env.sh
script here, to gather any other relevant environment details
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Type
Projects
Status