Skip to content

feat: Add option to disable FILTER when using distinct #10567

Open
@Riezebos

Description

@Riezebos

Is your feature request related to a problem?

.distinct(on=[...]) applies a filter on NULLs by default.

t1 = ibis.memtable(
    {
        "person_id": [1, 1],
        "contacted_at": [ibis.date("2024-01-01").execute(), None],
        "updated_at": [None, ibis.date("2024-01-01").execute()],
    }
)
t1.order_by(_.updated_at.desc()).distinct(on=["person_id"])

Output:

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ person_id ┃ contacted_at ┃ updated_at ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64     │ date         │ date       │
├───────────┼──────────────┼────────────┤
│         1 │ 2024-01-01   │ 2024-01-01 │
└───────────┴──────────────┴────────────┘

What is the motivation behind your request?

I have a table with "updates", I want to get the exact first row when ordering by the "updated_at" column. I don't want values from other rows in the result.

Describe the solution you'd like

.distinct(on=[...], filter_nulls=False)

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions