API (string dtype): comparisons between different string classes

Some comparisons between different classes of string (e.g. `string[pyarrow]` and `str`) raise. Resolving this is straightforward except for what class should be returned. I would expect it should always be the left obj, e.g. `string[pyarrow] == str` should return `string[pyarrow]` whereas `str == string[pyarrow]` should return `str`. Is this the concensus?

We currently run into issues with how Python handles subclasses with comparison dunders.

```python
lhs = pd.array(["x", pd.NA, "y"], dtype="string[pyarrow]")
rhs = pd.array(["x", pd.NA, "y"], dtype=pd.StringDtype("pyarrow", np.nan))

print(lhs.__eq__(rhs))
# <ArrowExtensionArray>
# [True, <NA>, True]
# Length: 3, dtype: bool[pyarrow]

print(lhs == rhs)
# [ True False  True]
```

The two results above differ because `ArrowStringArrayNumpySemantics` is a proper subclass of `ArrowStringArray` and therefore Python first calls `rhs.__eq__(lhs)`.

We can avoid this by special casing this particular case in `ArrowStringArrayNumpySemantics`, but I wanted to open up an issue for discussion before proceeding.

cc @WillAyd @jorisvandenbossche 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API (string dtype): comparisons between different string classes #60639

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API (string dtype): comparisons between different string classes #60639

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions