Skip to content

Add IntoDType alias, replace DType | type[DType] #2626

Closed
@dangotbanned

Description

@dangotbanned

Most places that we currently use DType | type[DType] will eventually raise an error if the type[...] is nested.

But this isn't reflected in the typing or in the docstrings, which I think isn't helpful

Example

import polars as pl
import narwhals as nw

data = {"a": ["A", "B", "A"], "b": [1, 2, 3], "c": [9, 2, 4], "d": [8, 7, 8]}
df = pl.DataFrame(data)
frame = nw.from_native(df)

good = nw.col("a").cast(nw.Struct([nw.Field("a", nw.String)]))
bad = nw.col("a").cast(nw.Struct)
>>> frame.select(good)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (3, 1)   |
|  ┌───────────┐   |
|a|
|---|
|struct[1] │   |
|  ╞═══════════╡   |
|  │ {"A"}     │   |
|  │ {"B"}     │   |
|  │ {"A"}     │   |
|  └───────────┘   |
└──────────────────┘

The error we get kinda hints towards the issue in the individual case.
But if we hit this path when dealing with multiple DTypes - I imagine the error wouldn't help much at all

>>> frame.select(bad)
AttributeError: type object 'Struct' has no attribute 'fields'
Traceback

/narwhals/narwhals/_polars/expr.py:65, in PolarsExpr.cast(self, dtype)
     64 def cast(self, dtype: DType | type[DType]) -> Self:
---> 65     dtype_pl = narwhals_to_native_dtype(dtype, self._version, self._backend_version)
     66     return self._with_native(self.native.cast(dtype_pl))

/narwhals/narwhals/_polars/utils.py:214, in narwhals_to_native_dtype(dtype, version, backend_version)
    207     return pl.List(narwhals_to_native_dtype(dtype.inner, version, backend_version))
    208 if isinstance_or_issubclass(dtype, dtypes.Struct):
    209     fields = [
    210         pl.Field(
    211             field.name,
    212             narwhals_to_native_dtype(field.dtype, version, backend_version),
    213         )
--> 214         for field in dtype.fields
    215     ]
    216     return pl.Struct(fields)
    217 if isinstance_or_issubclass(dtype, dtypes.Array):  # pragma: no cover

AttributeError: type object 'Struct' has no attribute 'fields'

Proposal

We should narrow the type[DType] side of the union to something like:

from __future__ import annotations

from typing import TYPE_CHECKING

from narwhals import dtypes

if TYPE_CHECKING:
    from typing_extensions import TypeAlias

# NOTE: Needs to use forward references in the real thing    
NonNestedDType: TypeAlias = (
    dtypes.NumericType
    | dtypes.TemporalType
    | dtypes.String
    | dtypes.Boolean
    | dtypes.Binary
    | dtypes.Unknown
    | dtypes.Object
)

IntoDType: TypeAlias = dtypes.DType | type[NonNestedDType]

That would align nicely with the python type aliases we have already:

narwhals/narwhals/typing.py

Lines 355 to 360 in a0d82cd

NumericLiteral: TypeAlias = "int | float | Decimal"
TemporalLiteral: TypeAlias = "dt.date | dt.datetime | dt.time | dt.timedelta"
NonNestedLiteral: TypeAlias = (
"NumericLiteral | TemporalLiteral | str | bool | bytes | None"
)
PythonLiteral: TypeAlias = "NonNestedLiteral | list[Any] | tuple[Any, ...]"

And we now get some helpful red squiggles on the failing case 🥳:

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions