Skip to content

bug: Incorrect results for first with over (order_by is silently ignored!) #11812

@MarcoGorelli

Description

@MarcoGorelli

What happened?

thanks @mesejo and @NickCrews for your fix of #11656 ! This fixes some cases (and removes two xfails for ibis in Narwhals), but some other test cases remain unfixed and silently give incorrect output

Using the latest commit on main:

import ibis

t = ibis.memtable({"a": [1, 1, 2], "b": [4, 5, 6], "c": [None, 7, 8], "i": [1, None, 2]})

order_by = ibis.asc('i', nulls_first=True)
res = t.mutate(d = t.b.first(order_by=order_by).over(ibis.window(group_by='a'))).order_by('i')

print(res.to_pyarrow())
print(ibis.to_sql(res))

output:

pyarrow.Table
a: int64
b: int64
c: double
i: double
d: int64
----
a: [[1,2,1]]
b: [[4,6,5]]
c: [[null,8,7]]
i: [[1,2,null]]
d: [[4,6,4]]

sql:

SELECT
  *
FROM (
  SELECT
    "t0"."a",
    "t0"."b",
    "t0"."c",
    "t0"."i",
    FIRST_VALUE("t0"."b") OVER (PARTITION BY "t0"."a" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "d"
  FROM "ibis_pandas_memtable_rjymefsokzesllb7b7d5dofn5a" AS "t0"
) AS "t1"
ORDER BY
  "t1"."i" ASC

I'd have expected

result:

pyarrow.Table
a: int64
b: int64
c: double
i: double
d: int64
----
a: [[1,2,1]]
b: [[4,6,5]]
c: [[null,8,7]]
i: [[1,2,null]]
d: [[5,6,5]]

sql: something like

SELECT
  *
FROM (
  SELECT
    "t0"."a",
    "t0"."b",
    "t0"."c",
    "t0"."i",
    FIRST_VALUE("t0"."b" order by "t0"."i" nulls first) OVER (PARTITION BY "t0"."a" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "d"
  FROM "ibis_pandas_memtable_rjymefsokzesllb7b7d5dofn5a" AS "t0"
) AS "t1"
ORDER BY
  "t1"."i" ASC

What version of ibis are you using?

11.0.0 (installed from github on the latest commit on main, i.e. 2414952

What backend(s) are you using, if any?

duckdb

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions