Skip to content

feat: Add option to disable FILTER when using distinct #10567

Open
@Riezebos

Description

Is your feature request related to a problem?

.distinct(on=[...]) applies a filter on NULLs by default.

t1 = ibis.memtable(
    {
        "person_id": [1, 1],
        "contacted_at": [ibis.date("2024-01-01").execute(), None],
        "updated_at": [None, ibis.date("2024-01-01").execute()],
    }
)
t1.order_by(_.updated_at.desc()).distinct(on=["person_id"])

Output:

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ person_id ┃ contacted_at ┃ updated_at ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64     │ date         │ date       │
├───────────┼──────────────┼────────────┤
│         1 │ 2024-01-01   │ 2024-01-01 │
└───────────┴──────────────┴────────────┘

What is the motivation behind your request?

I have a table with "updates", I want to get the exact first row when ordering by the "updated_at" column. I don't want values from other rows in the result.

Describe the solution you'd like

.distinct(on=[...], filter_nulls=False)

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions