ENH: Consistent naming conventions for string dtype aliases #58141
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Right now the string aliases for our types is inconsistent
>>> import pandas as pd
>>> pd.Series(range(3), dtype="int8") # NumPy type
>>> pd.Series(range(3), dtype="Int8") # Pandas extension type
>>> pd.Series(range(3), dtype="int8[pyarrow]") # Arrow type
Strings have a similar inconsistency with "string", "string[pyarrow]" and "string[pyarrow_numpy]"
Feature Description
I think we should create"int8[numpy]" and "int8[pandas]" aliases to stay consistent with pyarrow. This also has the advantage of decoupling "int8" from NumPy, so perhaps in the future we can allow the setting of the backend determine if NumPy or pyarrow types are returned
The pattern thus becomes "data_type[backend]", with the exception of "string[pyarrow_numpy]" which combines combines the backend and nullability semantics together. I am less sure what to do in that case - maybe even that should be called "string[pyarrow, numpy]" where the second argument is nullability?
In any case I am just hoping we can start to detach the logical type from the physical storage / nulllability semantics with a well defined pattern
Alternative Solutions
n/a
Additional Context
No response