String dtype: more informative repr (keeping brief __str__) #61148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempt to address #59342
With the current version of the PR, the reprs for the different dtype variants are:
Some questions to decide on:
<...>
or not? (we are somewhat inconsistent internally for similar reprs; e.g. the Index repr does not use it, the ExtensionArray repr does)<..>
makes it clearer that it is not necessarily exactly executable code, I think__str__
as is (i.e. just"str"
or"string"
), or do we include the storage for the"string"
case (to preserve the current repr behaviour). i.e. make it to have the options"str"
,"string[pyarrow]"
or"string[python]"
.dtype.name
attribute or not (which right now is defined to be "str" or "string").name
(e.g. "datetime64[s, UTC]"), while for CategoricalDtype we do not (there it is just "category")"string[python]"
, while we still allow that as string alias fordtype
arguments (e.g. in constructors or inastype()
)pd.NA
andnp.nan
, which means they are displayed as<NA>
andnan
pd.NA
andnp.nan
. This makes it a more "executable" repr, which could be nice, but on the other hand I also don't want to encourage that too much.