Skip to content

[Enh]: Add Expr.sort_by #2534

Open
@dangotbanned

Description

@dangotbanned

Related

Description

Originally posted by @dangotbanned in #1657 (reply in thread)

I'm surprised this didn't get mentioned before, but wouldn't polars.Expr.sort_by be an alternative to use in some of these cases?

I was looking at Order-dependence which has this example:

import pandas as pd
from sqlframe.duckdb import DuckDBSession

import narwhals as nw

session = DuckDBSession()
data = {"a": [1, 3, 4], "i": [0, 1, 2]}

df_pd = pd.DataFrame({"a": [1, 3, 4], "i": [0, 1, 2]})
sqlframe_df = session.createDataFrame(df_pd)
ldf = nw.from_native(sqlframe_df)

>>> ldf.with_columns(a_cum_sum=nw.col("a").cum_sum().over(order_by="i")).collect("pandas")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|   a  i  a_cum_sum|
|0  1  0          1|
|1  3  1          4|
|2  4  2          8|
└──────────────────┘

Translating that Expr directly to polars isn't valid at runtime:

import polars as pl

ldf = pl.LazyFrame(data)

>>> ldf.with_columns(a_cum_sum=pl.col("a").cum_sum().over(order_by="i")).collect()
TypeError: Expr.over() missing 1 required positional argument: 'partition_by'

But we can use the shorter sort_by:

>>> ldf.with_columns(a_cum_sum=pl.col("a").cum_sum().sort_by("i")).collect()
shape: (3, 3)
┌─────┬─────┬───────────┐
│ aia_cum_sum │
│ ---------       │
│ i64i64i64       │
╞═════╪═════╪═══════════╡
│ 101         │
│ 314         │
│ 428         │
└─────┴─────┴───────────┘

@MarcoGorelli I feel like I must be missing something 😄

The API for sort_by matches what we need for determining order-dependence and has all the arguments you'd expect for the SQL-based backends:

  • *by
  • descending
  • nulls_last

Also seems to have been available as far back as 0.18.0, but without nulls_last support until 0.20.*

Personally, I quite like how we'd be able to suggest this in an error without mentioning windows/over

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions