-
Notifications
You must be signed in to change notification settings - Fork 165
Open
Description
Issue
When writing DataFrames containing string columns to ArcticDB, the following error occurs:
TypeError: Cannot interpret '<StringDtype(storage='python', na_value=nan)>' as a data type
Root Cause
Starting from pandas 2.0, string columns may be inferred as the new nullable StringDtype (string[python]) instead of the traditional object dtype. ArcticDB does not recognize this
new dtype during serialization.
This behavior is more likely to trigger in:
- pandas 2.1+ (where
future.infer_stringmay be enabled by default) - Python 3.12+ environments
- DataFrames created with explicit string values
Affected Code
Any code that creates a DataFrame with string columns and writes to ArcticDB:
df = pd.DataFrame([{"col": "string_value"}])
lib.write(symbol, df) # Fails if "col" is inferred as StringDtype
## Workaround
Explicitly cast string columns to object dtype before writing:
df["col"] = df["col"].astype(object)
lib.write(symbol, df) # Works
## Recommendation
ArcticDB should either:
1. Add native support for pandas StringDtype
2. Or automatically convert StringDtype columns to object dtype internally during write operationsReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels