Skip to content

Add support for nested data types (lists and dictionaries) #281

@DanielAvdar

Description

@DanielAvdar

Feature Request: Support for Nested Data Types

Description

I would like to request support for converting nested data types like lists and dictionaries when using the pandas PyArrow backend. Currently, the library handles primitive types, datetime types, and some object types well, but there's no explicit support for nested data structures.

Use Case

Working with hierarchical or structured data that contains:

  • Lists of values within a DataFrame cell
  • Dictionary/map structures within a DataFrame cell
  • Nested combinations of the above

Expected Behavior

When calling convert_to_pyarrow() on a DataFrame with columns containing lists or dictionaries, they should be properly converted to their corresponding PyArrow types:

  • Lists should convert to list[pyarrow] type
  • Dictionaries should convert to struct[pyarrow] or map[pyarrow] types

Example

import pandas as pd
from pandas_pyarrow import convert_to_pyarrow

# Create a pandas DataFrame with nested data types
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [['a', 'b'], ['c', 'd'], ['e', 'f']],  # list column
    'C': [{'x': 1}, {'y': 2}, {'z': 3}]  # dict column
})

# Convert to PyArrow dtypes
adf = convert_to_pyarrow(df)

# Expected output should properly handle the nested types
print(adf.dtypes)

Technical Considerations

  • PyArrow supports list and struct types that could map to Python's lists and dictionaries
  • May need to handle type inference for nested structures
  • Might require recursive handling for deeply nested structures

Benefits

Adding support for these data types would make the library more versatile for real-world data analysis scenarios where hierarchical or structured data is common.

References

I appreciate your consideration of this feature request!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions