Skip to content

Commit aa48faa

Browse files
authored
feat: Implement partial "lazy" support for DuckDB (even with this PR, DuckDB support is work-in-progress!) (#1725)
1 parent e56f91d commit aa48faa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+2064
-217
lines changed

README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@
1414
Extremely lightweight and extensible compatibility layer between dataframe libraries!
1515

1616
- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
17-
- **Lazy-only support**: Dask
18-
- **Interchange-level support**: DuckDB, Ibis, Vaex, anything which implements the DataFrame Interchange Protocol
17+
- **Lazy-only support**: Dask. Work in progress: DuckDB, Ibis, PySpark.
1918

2019
Seamlessly support all, without depending on any!
2120

docs/backcompat.md

+4
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,10 @@ before making any change.
111111

112112
### After `stable.v1`
113113

114+
115+
- Since Narwhals 1.21, passing a `DuckDBPyRelation` to `from_native` returns a `LazyFrame`. In
116+
`narwhals.stable.v1`, it returns a `DataFrame` with `level='interchange'`.
117+
114118
- Since Narwhals 1.15, `Series` is generic in the native Series, meaning that you can
115119
write:
116120
```python

docs/basics/dataframe_conversion.md

+11-5
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ To illustrate, we create dataframes in various formats:
1414
```python exec="1" source="above" session="conversion"
1515
import narwhals as nw
1616
from narwhals.typing import IntoDataFrame
17+
from typing import Any
1718

1819
import duckdb
1920
import polars as pl
@@ -45,11 +46,15 @@ print(df_to_pandas(df_polars))
4546

4647
### Via PyCapsule Interface
4748

48-
Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe to Polars format using Narwhals.
49+
Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe
50+
which implements `__arrow_c_stream__`:
4951

5052
```python exec="1" source="above" session="conversion" result="python"
51-
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
52-
return nw.from_arrow(nw.from_native(df), native_namespace=pl).to_native()
53+
def df_to_polars(df_native: Any) -> pl.DataFrame:
54+
if hasattr(df_native, "__arrow_c_stream__"):
55+
return nw.from_arrow(df_native, native_namespace=pl).to_native()
56+
msg = f"Expected object which implements '__arrow_c_stream__' got: {type(df)}"
57+
raise TypeError(msg)
5358

5459

5560
print(df_to_polars(df_duckdb)) # You can only execute this line of code once.
@@ -66,8 +71,9 @@ If you need to ingest the same dataframe multiple times, then you may want to go
6671
This may be less efficient than the PyCapsule approach above (and always requires PyArrow!), but is more forgiving:
6772

6873
```python exec="1" source="above" session="conversion" result="python"
69-
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
70-
return pl.DataFrame(nw.from_native(df).to_arrow())
74+
def df_to_polars(df_native: IntoDataFrame) -> pl.DataFrame:
75+
df = nw.from_native(df_native).lazy().collect()
76+
return pl.DataFrame(nw.from_native(df, eager_only=True).to_arrow())
7177

7278

7379
df_duckdb = duckdb.sql("SELECT * FROM df_polars")

docs/extending.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,16 @@ Currently, Narwhals has **full API** support for the following libraries:
1515
It also has **lazy-only** support for [Dask](https://github.com/dask/dask), and **interchange** support
1616
for [DuckDB](https://github.com/duckdb/duckdb) and [Ibis](https://github.com/ibis-project/ibis).
1717

18+
We are working towards full "lazy-only" support for DuckDB, Ibis, and PySpark.
19+
1820
### Levels of support
1921

2022
Narwhals comes with three levels of support:
2123

2224
- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
23-
- **Lazy-only support**: Dask
25+
- **Lazy-only support**: Dask. Work in progress: DuckDB, Ibis, PySpark.
2426
- **Interchange-level support**: DuckDB, Ibis, Vaex, anything which implements the DataFrame Interchange Protocol
2527

26-
The lazy-only layer is a major item on our 2025 roadmap, and hope to be able to bring libraries currently in
27-
the "interchange" level into that one.
28-
2928
Libraries for which we have full support can benefit from the whole
3029
[Narwhals API](./api-reference/index.md).
3130

narwhals/_arrow/dataframe.py

+4
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
from narwhals._arrow.utils import validate_dataframe_comparand
1717
from narwhals._expression_parsing import evaluate_into_exprs
1818
from narwhals.dependencies import is_numpy_array
19+
from narwhals.exceptions import ColumnNotFoundError
1920
from narwhals.utils import Implementation
2021
from narwhals.utils import flatten
2122
from narwhals.utils import generate_temporary_column_name
@@ -669,6 +670,9 @@ def unique(
669670
import pyarrow.compute as pc
670671

671672
df = self._native_frame
673+
if subset is not None and any(x not in self.columns for x in subset):
674+
msg = f"Column(s) {subset} not found in {self.columns}"
675+
raise ColumnNotFoundError(msg)
672676
subset = subset or self.columns
673677

674678
if keep in {"any", "first", "last"}:

narwhals/_dask/dataframe.py

+4
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from narwhals._dask.utils import parse_exprs_and_named_exprs
1212
from narwhals._pandas_like.utils import native_to_narwhals_dtype
1313
from narwhals._pandas_like.utils import select_columns_by_name
14+
from narwhals.exceptions import ColumnNotFoundError
1415
from narwhals.typing import CompliantLazyFrame
1516
from narwhals.utils import Implementation
1617
from narwhals.utils import flatten
@@ -197,6 +198,9 @@ def unique(
197198
*,
198199
keep: Literal["any", "none"] = "any",
199200
) -> Self:
201+
if subset is not None and any(x not in self.columns for x in subset):
202+
msg = f"Column(s) {subset} not found in {self.columns}"
203+
raise ColumnNotFoundError(msg)
200204
native_frame = self._native_frame
201205
if keep == "none":
202206
subset = subset or self.columns

0 commit comments

Comments
 (0)