feat(DRAFT): Adds `(Expr|Series).first()` #2528

dangotbanned · 2025-05-10T20:30:21Z

Will close #2526

What type of PR is this? (check all applicable)

Related issues

Closes [Enh]: add Expr.first #2526
Eager support blocked by chore: Simplify PandasLikeGroupBy #2680
- (thread)
Lazy support blocked by feat(DRAFT): Add Expr.sort_by #2547
- (thread)

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Towards (#2526)

See #2526 (comment)

https://duckdb.org/docs/stable/sql/functions/aggregates#firstarg

- Less sure about this one - `head(1)` also seemed like an option

All have *an* implementation now

dangotbanned · 2025-05-10T20:32:22Z

Anyone feel free to hop on this - just thought I'd get something up for every backend quickly 🙂

Lack of coverage is expected for now (https://github.com/narwhals-dev/narwhals/actions/runs/14948882535/job/41995794107)

MarcoGorelli · 2025-05-10T21:18:32Z

narwhals/_duckdb/expr.py

+    def first(self) -> Self:
+        def fn(_input: duckdb.Expression) -> duckdb.Expression:
+            return FunctionExpression("first", _input)
+
+        return self._with_callable(fn)


initial feedback: first is an orderable aggregation, so we'd need to require some order_by=...

Thanks @MarcoGorelli, so first step will be

narwhals/narwhals/expr.py

Lines 71 to 75 in dd41833

def _with_orderable_aggregation(

self, to_compliant_expr: Callable[[Any], Any]

) -> Self:

return self.__class__(

to_compliant_expr, self._metadata.with_orderable_aggregation()

Then see what to do in each backend

One thing I thought worth mentioning was that I don't think pl.Expr.first makes any stability guarantees.
Does that matter at all, or do you just want to enforce it in narwhals for the least suprises?

duckdb seems to have the same behavior as polars would

@MarcoGorelli these are the two other cases we have for _with_orderable_aggregation:

narwhals/narwhals/expr.py

Lines 785 to 786 in b7001e4

return self._with_orderable_aggregation(

lambda plx: self._to_compliant_expr(plx).arg_min()

narwhals/narwhals/expr.py

Lines 808 to 809 in b7001e4

return self._with_orderable_aggregation(

lambda plx: self._to_compliant_expr(plx).arg_max()

We currently don't support them in LazyExpr:

narwhals/narwhals/_compliant/expr.py

Lines 879 to 884 in b7001e4

class LazyExpr( # type: ignore[misc]

CompliantExpr[CompliantLazyFrameT, NativeExprT],

Protocol38[CompliantLazyFrameT, NativeExprT],

):

arg_min: not_implemented = not_implemented()

arg_max: not_implemented = not_implemented()

I'm just pushing what I think is how to enforce the order_by in (bd4ab89)
But I'm quite unsure 😄

https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.first.html

pola-rs/polars#19093

https://github.com/narwhals-dev/narwhals/actions/runs/14949511597/job/41997113260?pr=2528

narwhals/_compliant/series.py

https://github.com/narwhals-dev/narwhals/actions/runs/14949533546/job/41997163953?pr=2528

dangotbanned · 2025-05-10T22:16:35Z

narwhals/series.py

+        Examples:
+            >>> import polars as pl
+            >>> import narwhals as nw
+            >>>
+            >>> s_native = pl.Series([1, 2, 3])
+            >>> s_nw = nw.from_native(s_native, series_only=True)
+            >>> s_nw.first()
+            1
+            >>> s_nw.filter(s_nw > 5).first() is None
+            True


I don't like the None example, but this was the only way I saw to get a repr 😞

I think it's important to have an example for that case though - since pandas and pyarrow would raise an index error normally

The description is exactly https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.first.html

#2528 (comment)

Still need to add `dask`, `duckdb` equivalent of (bd4ab89)

dangotbanned · 2025-05-11T15:38:41Z

tests/expr_and_series/first_test.py

+@pytest.mark.parametrize(("col", "expected"), [("a", 8), ("b", 58), ("c", 2.5)])
+def test_first_expr_eager(
+    constructor_eager: ConstructorEager, col: str, expected: PythonLiteral
+) -> None:
+    df = nw.from_native(constructor_eager(data))
+    expr = nw.col(col).first()
+    result = df.select(expr)
+    assert_equal_data(result, {col: [expected]})


Feel like I got a bit unlucky with this being the first test I wrote 😅

So there's a wrinkle with how the .over(order_by=...) changes the meaning of the aggregation.

This is all good:

import polars as pl data = { "a": [8, 2, 1, None], "b": [58, 5, 6, 12], "c": [2.5, 1.0, 3.0, 0.9], "d": [2, 1, 4, 3], "idx": [0, 1, 2, 3], } df = pl.DataFrame(data) >>> df.select(pl.col("a").first()) shape: (1, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 8 │ └─────┘

polars is still fine in when doing this lazily:

>>> df.lazy().select(pl.col("a").first()).collect() shape: (1, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 8 │ └─────┘

We can also do use a .sort_by before .first:

>>> df.lazy().select(pl.col("a").sort_by("idx").first()).collect() shape: (1, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 8 │ └─────┘

But if we do that after, the sort column has the pre-agg shape:

>>> df.lazy().select(pl.col("a").first().sort_by("idx")).collect() ShapeError: `sort_by` produced different length (4) than the Series that has to be sorted (1)

If we do .over(,order_by=...), we end up broadcasting instead of aggregating:

>>> df.lazy().select(pl.col("a").first().over(pl.lit(1), order_by="idx")).collect() shape: (4, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 8 │ │ 8 │ │ 8 │ │ 8 │ └─────┘

@MarcoGorelli would we want to land (#2534) first so that we have a way to specify this as an aggregation?

I do hope there's another way we can do this with the existing Expr methods though 🙏

The example for .min() is something I'd expect to be able to do with first():

narwhals/narwhals/expr.py

Lines 724 to 742 in 6c110ca

def min(self) -> Self:

"""Returns the minimum value(s) from a column(s).

Returns:

A new expression.

Examples:

>>> import pandas as pd

>>> import narwhals as nw

>>> df_native = pd.DataFrame({"a": [1, 2], "b": [4, 3]})

>>> df = nw.from_native(df_native)

>>> df.select(nw.min("a", "b"))

┌──────────────────┐

|Narwhals DataFrame|

|------------------|

| a b |

| 0 1 3 |

└──────────────────┘

"""

The example for .min() is something I'd expect to be able to do with first():

min is not an orderable ops, I think the right op to compare with is arg_min, and that has the same behavior of broadcasting: see expected in our test:

narwhals/tests/expr_and_series/arg_min_test.py

Lines 44 to 56 in 6c110ca

def test_expr_arg_min_over() -> None:

# This is tricky. But, we may be able to support it for

# other backends too one day.

pytest.importorskip("polars")

import polars as pl

if POLARS_VERSION < (1, 10):

pytest.skip()

df = nw.from_native(pl.LazyFrame({"a": [9, 8, 7], "i": [0, 2, 1]}))

result = df.select(nw.col("a").arg_min().over(order_by="i"))

expected = {"a": [1, 1, 1]}

assert_equal_data(result, expected)

The thing is that arg_min is not supported in over context for any other backend than polars.
For first, I am having a harder time to figure it out for eagers than lazy ones 🥲 since we do:

pandas

for s in results: s._scatter_in_place(sorting_indices, s) return results

however s is a length 1 series and does not get broadcasted

pyarrow

result = self(df.drop([token], strict=True)) sorting_indices = pc.sort_indices(df.get_column(token).native) return [s._with_native(s.native.take(sorting_indices)) for s in result]

take fails due to index out of bound (as s has length 1)

@FBruzzesi I know arg_min is closer, I mentioned it in (#2528 (comment)) 😉

I guess the point I'm trying to make is that adding the constraint of an .over(order_by=...) changes the expression from what .first() does in polars.

This is what we'd need to suggest, since that's the way to maintain the aggregation in polars AFAICT

We can also do use a .sort_by before .first:

>>> df.lazy().select(pl.col("a").sort_by("idx").first()).collect() shape: (1, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 8 │ └─────┘

I'm just a little lost since the rules we've been working on are for after the aggregation - whereas this is flipped 🤔

(#2528 (comment))

take fails due to index out of bound (as s has length 1)

@FBruzzesi

Ah yeah I'm getting that locally as well - I'll push the tests as-is for now

https://github.com/narwhals-dev/narwhals/actions/runs/15000709204/job/42146640207?pr=2528

Exactly the behavior needed for `first()` #2528 (comment)

narwhals/_spark_like/expr.py

1/3 for #2528 (comment)

2/3 for #2528 (comment)

The version that worked isn't supported by `narwhals` anymore https://github.com/narwhals-dev/narwhals/actions/runs/15662564393/job/44122396713?pr=2528

One day I'll work out the right condition 😅 https://github.com/narwhals-dev/narwhals/actions/runs/15775635670/job/44469350175

dangotbanned added 10 commits May 10, 2025 20:34

chore: Add CompliantExpr.first

ff661ae

Towards (#2526)

feat: "Implement" PolarsExpr.First

1b77bd7

feat: Add EagerExpr.first

e84cba3

chore: Repeat for *Series

25ef241

feat: Add (Arrow|PandasLike)Series.first()

78822aa

chore: Mark LazyExpr.first as not_implemented for now

4075c50

See #2526 (comment)

feat: Add SparkLikeExpr.first

45f24b9

feat: Add DuckDBExpr.first

4041dd1

https://duckdb.org/docs/stable/sql/functions/aggregates#firstarg

feat: Add DaskExpr.first

bb9912d

- Less sure about this one - `head(1)` also seemed like an option

revert: 4075c50

6a53aa1

All have *an* implementation now

dangotbanned added the enhancement New feature or request label May 10, 2025

dangotbanned changed the title ~~feat(DRAFT): Adds Expr.first()~~ feat(DRAFT): Adds (Expr|Series).first() May 10, 2025

MarcoGorelli reviewed May 10, 2025

View reviewed changes

dangotbanned added 7 commits May 10, 2025 22:26

feat: Add nw.Series.first

4efc939

test: Add Series.first tests

fc149c1

fix: I guess the stubs were wrong then?

7489e61

fix: Handle the out-of-bounds case

d2719a4

https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.first.html

fix: polars backcompat

0af11db

pola-rs/polars#19093

docs: Add Series.first

afe20f0

lol version typo

6c0bd6f

https://github.com/narwhals-dev/narwhals/actions/runs/14949511597/job/41997113260?pr=2528

dangotbanned commented May 10, 2025

View reviewed changes

narwhals/_compliant/series.py Outdated Show resolved Hide resolved

cov

e0fdf78

https://github.com/narwhals-dev/narwhals/actions/runs/14949533546/job/41997163953?pr=2528

dangotbanned commented May 10, 2025

View reviewed changes

dangotbanned added 5 commits May 11, 2025 12:11

chore: Add nw.Expr.first

aa7c510

The description is exactly https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.first.html

Merge remote-tracking branch 'upstream/main' into expr-first

4fdc0aa

feat: Maybe SparkLike requires order_by?

bd4ab89

#2528 (comment)

test: Try out eager backends

9f7f5a9

Still need to add `dask`, `duckdb` equivalent of (bd4ab89)

Merge branch 'main' into expr-first

ddb50d2

dangotbanned commented May 11, 2025

View reviewed changes

dangotbanned added 4 commits May 13, 2025 16:02

Merge remote-tracking branch 'upstream/main' into expr-first

9c36285

test: xfail ibis

ad8e3f7

feat: Add IbisExpr.first

628f71e

test: Don't xfail for pandas<1.0.0

deacc71

https://github.com/narwhals-dev/narwhals/actions/runs/15000709204/job/42146640207?pr=2528

dangotbanned added a commit that referenced this pull request May 13, 2025

test: Add test_sort_by_orderable_agg

ab887fe

Exactly the behavior needed for `first()` #2528 (comment)

dangotbanned mentioned this pull request May 13, 2025

feat(DRAFT): Add Expr.sort_by #2547

Draft

10 tasks

dangotbanned added 3 commits May 14, 2025 14:06

Merge branch 'main' into expr-first

5c52ee4

Merge branch 'main' into expr-first

eec2a4f

Merge branch 'main' into expr-first

e003bab

dangotbanned mentioned this pull request May 18, 2025

[Enh]: A richer Expr internal representation #2571

Open

7 tasks

Merge remote-tracking branch 'upstream/main' into expr-first

fb2dc1c

dangotbanned mentioned this pull request May 20, 2025

feat: add ~~required~~ keep argument to Expr.mode #1793

Open

Merge remote-tracking branch 'upstream/main' into expr-first

211673b

dangotbanned commented Jun 3, 2025

View reviewed changes

narwhals/_spark_like/expr.py Outdated Show resolved Hide resolved

dangotbanned mentioned this pull request Jun 6, 2025

chore: Refactor window functions for lazy backends #2649

Merged

10 tasks

dangotbanned added 7 commits June 13, 2025 15:59

fix: Use reverted partition_by, _sort

652615f

Merge remote-tracking branch 'upstream/main' into expr-first

68fdfe8

fix: Update DuckDBExpr.first

ecaca9a

1/3 for #2528 (comment)

fix: Update IbisExpr.first

ea30f26

2/3 for #2528 (comment)

fix: Update SparkLikeExpr.first

12987ee

Merge remote-tracking branch 'upstream/main' into expr-first

7d70a42

test: Update pandas xfail

5446095

The version that worked isn't supported by `narwhals` anymore https://github.com/narwhals-dev/narwhals/actions/runs/15662564393/job/44122396713?pr=2528

dangotbanned mentioned this pull request Jun 15, 2025

chore: Simplify PandasLikeGroupBy #2680

Open

9 tasks

dangotbanned added the blocked label Jun 20, 2025

dangotbanned added 5 commits June 20, 2025 10:17

Merge branch 'main' into expr-first

b927340

test: Don't xfail for pandas 1.1.3<=...<1.1.5

f62c085

One day I'll work out the right condition 😅 https://github.com/narwhals-dev/narwhals/actions/runs/15775635670/job/44469350175

Merge branch 'main' into expr-first

45d20c8

Merge remote-tracking branch 'upstream/main' into expr-first

72ab185

fix: Upgrade DuckDBExpr.first again

e72b115

dangotbanned mentioned this pull request Jun 30, 2025

[Enh]: add Expr.first #2526

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(DRAFT): Adds `(Expr|Series).first()` #2528

feat(DRAFT): Adds `(Expr|Series).first()` #2528

Uh oh!

dangotbanned commented May 10, 2025 •

edited

Loading

Uh oh!

dangotbanned commented May 10, 2025 •

edited

Loading

Uh oh!

MarcoGorelli May 10, 2025

Uh oh!

dangotbanned May 10, 2025

Uh oh!

dangotbanned May 10, 2025

Uh oh!

dangotbanned May 10, 2025

Uh oh!

dangotbanned May 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

dangotbanned May 10, 2025

Uh oh!

dangotbanned May 11, 2025

Uh oh!

dangotbanned May 11, 2025

Uh oh!

FBruzzesi May 11, 2025

Uh oh!

FBruzzesi May 11, 2025

Uh oh!

dangotbanned May 11, 2025

Uh oh!

dangotbanned May 11, 2025 •

edited

Loading

Uh oh!

dangotbanned May 11, 2025

Uh oh!

Uh oh!

Uh oh!

	def _with_orderable_aggregation(
	self, to_compliant_expr: Callable[[Any], Any]
	) -> Self:
	return self.__class__(
	to_compliant_expr, self._metadata.with_orderable_aggregation()

	return self._with_orderable_aggregation(
	lambda plx: self._to_compliant_expr(plx).arg_min()

	class LazyExpr( # type: ignore[misc]
	CompliantExpr[CompliantLazyFrameT, NativeExprT],
	Protocol38[CompliantLazyFrameT, NativeExprT],
	):
	arg_min: not_implemented = not_implemented()
	arg_max: not_implemented = not_implemented()

	def min(self) -> Self:
	"""Returns the minimum value(s) from a column(s).

	Returns:
	A new expression.

	Examples:
	>>> import pandas as pd
	>>> import narwhals as nw
	>>> df_native = pd.DataFrame({"a": [1, 2], "b": [4, 3]})
	>>> df = nw.from_native(df_native)
	>>> df.select(nw.min("a", "b"))
	┌──────────────────┐
	\|Narwhals DataFrame\|
	\|------------------\|
	\| a b \|
	\| 0 1 3 \|
	└──────────────────┘
	"""

	def test_expr_arg_min_over() -> None:
	# This is tricky. But, we may be able to support it for
	# other backends too one day.
	pytest.importorskip("polars")
	import polars as pl

	if POLARS_VERSION < (1, 10):
	pytest.skip()

	df = nw.from_native(pl.LazyFrame({"a": [9, 8, 7], "i": [0, 2, 1]}))
	result = df.select(nw.col("a").arg_min().over(order_by="i"))
	expected = {"a": [1, 1, 1]}
	assert_equal_data(result, expected)

feat(DRAFT): Adds (Expr|Series).first() #2528

Are you sure you want to change the base?

feat(DRAFT): Adds (Expr|Series).first() #2528

Uh oh!

Conversation

dangotbanned commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

dangotbanned commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

feat(DRAFT): Adds `(Expr|Series).first()` #2528

feat(DRAFT): Adds `(Expr|Series).first()` #2528

dangotbanned commented May 10, 2025 •

edited

Loading

dangotbanned commented May 10, 2025 •

edited

Loading

dangotbanned May 11, 2025 •

edited

Loading

dangotbanned May 11, 2025 •

edited

Loading