Skip to content

Use nullable types by default and avoid object type #10

@ehariri

Description

@ehariri

To make sure the Pandas APIs are Jittable, Pandas should avoid changing data types based on values and avoid the object type.

One example is that Pandas should default to nullable types directly instead of Numpy/Object types. Currently if you specify a value that Pandas will treat as NA (e.g. None) without a pd.array, you will not produce a nullable type and instead produce a Numpy array type, often an object array.

Arrays with NAs should automatically use the correct nullable type. E.g. pd.DataFrame({'A': [1, 2, None], 'B': [True, False, None]}) should have column A of type Int64 rather than float64 and column B of type boolean rather than object. Users can specify non-nullable data types directly if necessary.

For output of I/O calls like read_csv, the data type should always be a nullable type, and not determined based on values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions