feat: Add option to disable FILTER when using distinct
#10567
Open
Description
Is your feature request related to a problem?
.distinct(on=[...])
applies a filter on NULLs by default.
t1 = ibis.memtable(
{
"person_id": [1, 1],
"contacted_at": [ibis.date("2024-01-01").execute(), None],
"updated_at": [None, ibis.date("2024-01-01").execute()],
}
)
t1.order_by(_.updated_at.desc()).distinct(on=["person_id"])
Output:
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ person_id ┃ contacted_at ┃ updated_at ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ date │ date │
├───────────┼──────────────┼────────────┤
│ 1 │ 2024-01-01 │ 2024-01-01 │
└───────────┴──────────────┴────────────┘
What is the motivation behind your request?
I have a table with "updates", I want to get the exact first row when ordering by the "updated_at" column. I don't want values from other rows in the result.
Describe the solution you'd like
.distinct(on=[...], filter_nulls=False)
What version of ibis are you running?
9.5.0
What backend(s) are you using, if any?
DuckDB
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Assignees
Type
Projects
Status
backlog