Description
Some comparisons between different classes of string (e.g. string[pyarrow]
and str
) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g. string[pyarrow] == str
should return string[pyarrow]
whereas str == string[pyarrow]
should return str
. Is this the concensus?
We currently run into issues with how Python handles subclasses with comparison dunders.
lhs = pd.array(["x", pd.NA, "y"], dtype="string[pyarrow]")
rhs = pd.array(["x", pd.NA, "y"], dtype=pd.StringDtype("pyarrow", np.nan))
print(lhs.__eq__(rhs))
# <ArrowExtensionArray>
# [True, <NA>, True]
# Length: 3, dtype: bool[pyarrow]
print(lhs == rhs)
# [ True False True]
The two results above differ because ArrowStringArrayNumpySemantics
is a proper subclass of ArrowStringArray
and therefore Python first calls rhs.__eq__(lhs)
.
We can avoid this by special casing this particular case in ArrowStringArrayNumpySemantics
, but I wanted to open up an issue for discussion before proceeding.