Skip to content

enh: Expr.filter for spark-like backends #2203

Open
@MarcoGorelli

Description

@MarcoGorelli

For LazyFrames, we can support Expr.filter so long as it's followed by an aggregation. This should already be validated at the Narwhals level, I don't think anything needs changing there

For spark-like, Column.filter doesn't exist, but we can use F.expr to accomplish the same:

from sqlframe.duckdb import DuckDBSession
import sqlframe.duckdb.functions as F

df = DuckDBSession().createDataFrame(pd.DataFrame({'a': [1,1,2], 'b': [4,5,6]}))
df = df.select(
    F.expr('sum(b) filter (where a==1)').alias('c'),
    F.expr('sum(b) filter (where a!=1)').alias('d'),
)
df.show()
+---+---+
| c | d |
+---+---+
| 9 | 6 |
+---+---+

If anyone fancies implementing Expr.filter for _spark_like using the above, then my hope is that it will "just work", similar to how it currently works for Dask

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededlow prioritypysparkIssue is related to pyspark backend

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions