Skip to content

ArcticDB Incompatibility with pandas 2.x StringDtype #2867

@wy-z

Description

@wy-z

Issue

When writing DataFrames containing string columns to ArcticDB, the following error occurs:

TypeError: Cannot interpret '<StringDtype(storage='python', na_value=nan)>' as a data type

Root Cause

Starting from pandas 2.0, string columns may be inferred as the new nullable StringDtype (string[python]) instead of the traditional object dtype. ArcticDB does not recognize this
new dtype during serialization.

This behavior is more likely to trigger in:

  • pandas 2.1+ (where future.infer_string may be enabled by default)
  • Python 3.12+ environments
  • DataFrames created with explicit string values

Affected Code

Any code that creates a DataFrame with string columns and writes to ArcticDB:

df = pd.DataFrame([{"col": "string_value"}])
lib.write(symbol, df)  # Fails if "col" is inferred as StringDtype

## Workaround

Explicitly cast string columns to object dtype before writing:

df["col"] = df["col"].astype(object)
lib.write(symbol, df)  # Works

## Recommendation

ArcticDB should either:
1. Add native support for pandas StringDtype
2. Or automatically convert StringDtype columns to object dtype internally during write operations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions