Open
Description
Related
- api: plan for `narwhals.stable.v2` #1657 (reply in thread)
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.sort_by.html
Description
Originally posted by @dangotbanned in #1657 (reply in thread)
I'm surprised this didn't get mentioned before, but wouldn't polars.Expr.sort_by
be an alternative to use in some of these cases?
I was looking at Order-dependence which has this example:
import pandas as pd
from sqlframe.duckdb import DuckDBSession
import narwhals as nw
session = DuckDBSession()
data = {"a": [1, 3, 4], "i": [0, 1, 2]}
df_pd = pd.DataFrame({"a": [1, 3, 4], "i": [0, 1, 2]})
sqlframe_df = session.createDataFrame(df_pd)
ldf = nw.from_native(sqlframe_df)
>>> ldf.with_columns(a_cum_sum=nw.col("a").cum_sum().over(order_by="i")).collect("pandas")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| a i a_cum_sum|
|0 1 0 1|
|1 3 1 4|
|2 4 2 8|
└──────────────────┘
Translating that Expr
directly to polars
isn't valid at runtime:
import polars as pl
ldf = pl.LazyFrame(data)
>>> ldf.with_columns(a_cum_sum=pl.col("a").cum_sum().over(order_by="i")).collect()
TypeError: Expr.over() missing 1 required positional argument: 'partition_by'
But we can use the shorter sort_by
:
>>> ldf.with_columns(a_cum_sum=pl.col("a").cum_sum().sort_by("i")).collect()
shape: (3, 3)
┌─────┬─────┬───────────┐
│ a ┆ i ┆ a_cum_sum │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════════╡
│ 1 ┆ 0 ┆ 1 │
│ 3 ┆ 1 ┆ 4 │
│ 4 ┆ 2 ┆ 8 │
└─────┴─────┴───────────┘
@MarcoGorelli I feel like I must be missing something 😄
The API for sort_by
matches what we need for determining order-dependence and has all the arguments you'd expect for the SQL-based backends:
*by
descending
nulls_last
Also seems to have been available as far back as 0.18.0
, but without nulls_last
support until 0.20.*
Personally, I quite like how we'd be able to suggest this in an error without mentioning windows/over