Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: displaying string dtypes not showing storage option #50151

Merged
merged 5 commits into from
Dec 13, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 9, 2022

cc @mroeschke not saying that this is a good fix, but this shows that we have to use repr for the string dtypes in some way. Is there a more general rule to accomplish this?

@phofl phofl added Bug Output-Formatting __repr__ of pandas objects, to_string Strings String extension data type and string data labels Dec 9, 2022
@mroeschke
Copy link
Member

Looks to be a doctest error

=================================== FAILURES ===================================
_____________ [doctest] pandas.core.generic.NDFrame.convert_dtypes _____________
6466         Convert the DataFrame to use best possible dtypes.
6467 
6468         >>> dfn = df.convert_dtypes()
6469         >>> dfn
6470            a  b      c     d     e      f
6471         0  1  x   True     h    10   <NA>
6472         1  2  y  False     i  <NA>  100.5
6473         2  3  z   <NA>  <NA>    20  200.0
6474 
6475         >>> dfn.dtypes
Differences (unified diff with -expected +actual):
    @@ -1,7 +1,7 @@
    -a      Int32
    -b     string
    -c    boolean
    -d     string
    -e      Int64
    -f    Float64
    +a             Int32
    +b    string[python]
    +c           boolean
    +d    string[python]
    +e             Int64
    +f           Float64
     dtype: object

def test_to_string_string_dtype():
# GH#50099
if pa_version_under6p0:
pytest.skip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you use the pytest.mark.skipif decorator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, changed

@mroeschke
Copy link
Member

Is there a more general rule to accomplish this?

Yeah I don't think at the moment. Not sure why StringDtype (before StorageDtype was refactored away) has this str vs repr difference.

Looks like I actually hacked around this when creating ArrowDtype lol

class StorageExtensionDtype(ExtensionDtype):
    """ExtensionDtype that may be backed by more than one implementation."""

    name: str
    _metadata = ("storage",)

    def __init__(self, storage=None) -> None:
        self.storage = storage

    def __repr__(self) -> str:
        return f"{self.name}[{self.storage}]"

    def __str__(self) -> str:
        return self.name
class ArrowDtype(StorageExtensionDtype):
    def __repr__(self) -> str:
        return self.name

    @property
    def name(self) -> str:  # type: ignore[override]
        """
        A string identifying the data type.
        """
        return f"{str(self.pyarrow_dtype)}[{self.storage}]"

@phofl
Copy link
Member Author

phofl commented Dec 12, 2022

Thx, fixed the doctest.

Ok then can keep as is :)

@mroeschke mroeschke added this to the 2.0 milestone Dec 13, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM can merge on green

@phofl phofl merged commit d050408 into pandas-dev:main Dec 13, 2022
Copy link
Contributor

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl!

@jorisvandenbossche
Copy link
Member

Not sure why StringDtype (before StorageDtype was refactored away) has this str vs repr difference.

As far as I remember, this was actually done intentionally. And I personally I would prefer to keep this. But let's discuss on the issue (will reopen that one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.dtypes doesn't include backend for string columns
4 participants