Closed
Description
Most places that we currently use DType | type[DType]
will eventually raise an error if the type[...]
is nested.
But this isn't reflected in the typing or in the docstrings, which I think isn't helpful
Example
import polars as pl
import narwhals as nw
data = {"a": ["A", "B", "A"], "b": [1, 2, 3], "c": [9, 2, 4], "d": [8, 7, 8]}
df = pl.DataFrame(data)
frame = nw.from_native(df)
good = nw.col("a").cast(nw.Struct([nw.Field("a", nw.String)]))
bad = nw.col("a").cast(nw.Struct)
>>> frame.select(good)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| shape: (3, 1) |
| ┌───────────┐ |
| │ a │ |
| │ --- │ |
| │ struct[1] │ |
| ╞═══════════╡ |
| │ {"A"} │ |
| │ {"B"} │ |
| │ {"A"} │ |
| └───────────┘ |
└──────────────────┘
The error we get kinda hints towards the issue in the individual case.
But if we hit this path when dealing with multiple DType
s - I imagine the error wouldn't help much at all
>>> frame.select(bad)
AttributeError: type object 'Struct' has no attribute 'fields'
Traceback
/narwhals/narwhals/_polars/expr.py:65, in PolarsExpr.cast(self, dtype)
64 def cast(self, dtype: DType | type[DType]) -> Self:
---> 65 dtype_pl = narwhals_to_native_dtype(dtype, self._version, self._backend_version)
66 return self._with_native(self.native.cast(dtype_pl))
/narwhals/narwhals/_polars/utils.py:214, in narwhals_to_native_dtype(dtype, version, backend_version)
207 return pl.List(narwhals_to_native_dtype(dtype.inner, version, backend_version))
208 if isinstance_or_issubclass(dtype, dtypes.Struct):
209 fields = [
210 pl.Field(
211 field.name,
212 narwhals_to_native_dtype(field.dtype, version, backend_version),
213 )
--> 214 for field in dtype.fields
215 ]
216 return pl.Struct(fields)
217 if isinstance_or_issubclass(dtype, dtypes.Array): # pragma: no cover
AttributeError: type object 'Struct' has no attribute 'fields'
Proposal
We should narrow the type[DType]
side of the union to something like:
from __future__ import annotations
from typing import TYPE_CHECKING
from narwhals import dtypes
if TYPE_CHECKING:
from typing_extensions import TypeAlias
# NOTE: Needs to use forward references in the real thing
NonNestedDType: TypeAlias = (
dtypes.NumericType
| dtypes.TemporalType
| dtypes.String
| dtypes.Boolean
| dtypes.Binary
| dtypes.Unknown
| dtypes.Object
)
IntoDType: TypeAlias = dtypes.DType | type[NonNestedDType]
That would align nicely with the python type aliases we have already:
Lines 355 to 360 in a0d82cd
And we now get some helpful red squiggles on the failing case 🥳: