feat: `ArrowDataFrame.explode` #1644

FBruzzesi · 2024-12-22T11:38:43Z

What type of PR is this? (check all applicable)

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

I will leave this as draft until we decide how to move forward.

To summarize the discussion(s) in #1542 :

pyarrow native methods ignore nulls and empty list in explode
the workaround here is to have a "fast_path" for when nulls or empty lists are not present, and a dedicated path for when they are
the issue is that we enter python world to create the index via one .to_pylist() call
pandas seems to enter cython anyway to explode a list

…/narwhals into feat/explode-method

dangotbanned · 2025-03-25T11:04:11Z

@FBruzzesi I feel like this shouldn't have got lost!

ArrowDataFrame.explode is 1 of 3 remaining implementations we need

narwhals/narwhals/_arrow/dataframe.py

Line 350 in 2bcc6bb

explode = not_implemented()

narwhals/narwhals/_arrow/dataframe.py

Line 466 in 2bcc6bb

join_asof = not_implemented()

I might add a PR for ArrowDataFrame.clone - since it can just utilize arrow data being immutable

narwhals/narwhals/_arrow/dataframe.py

Line 669 in 2bcc6bb

clone = not_implemented()

FBruzzesi · 2025-03-25T11:08:12Z

I feel like this shouldn't have got lost!

Thanks @dangotbanned ♥️ The main concern was a conversion to python object: filled_counts.to_pylist() in:

    if fast_path:
        indices = pc.list_parent_indices(native_frame[to_explode[0]])
        flatten_func = pc.list_flatten

    else:
        filled_counts = pc.max_element_wise(counts, 1, skip_nulls=True)
        indices = pa.array(
            [
                i
                for i, count in enumerate(filled_counts.to_pylist())
                for _ in range(count)
            ]
        )

dangotbanned · 2025-03-25T11:10:13Z

#1644 (comment)

Maybe we can figure out another path hidden somewhere in the stubs? 🤔

Mentioned in #1644 (comment) #2207

https://results.pre-commit.ci/run/github/760058710/1742905302.AsTci5pETIqquA1eJPcxNQ

`.to_pylist` being called on a scalar is all that is left

dangotbanned · 2025-03-25T13:20:53Z

Series[list].explode() should not return None for empty lists pola-rs/polars#17664

@FBruzzesi @MarcoGorelli

It seems like polars wants to make a breaking change in the next major version - resulting in the same behavior as pyarrow.

If we had that behavior as the goal - I think pc.list_flatten(..., recursive=True) would get us most of the way there.
Just something to keep in mind for the future 🙂

Just leaving as-is, since this'll probably change in the future #1644 (comment)

> error: Incompatible redefinition (redefinition with type "Callable[[ChunkedArray[ListScalar[Any]]], ChunkedArray[Any]]", original type overloaded function) [misc] https://github.com/narwhals-dev/narwhals/actions/runs/14060304329/job/39369169923?pr=1644

FBruzzesi and others added 19 commits December 8, 2024 22:46

feat: DataFrame and LazyFrame explode

3061fe9

arrow refactor

2326b08

raise for invalid type and docstrings

32af22e

Update narwhals/dataframe.py

3b52ab5

old versions

c3bf009

merge main

b427e79

Merge branch 'main' into feat/explode-method

c77dc62

almost all native

72314a2

doctest

7f04579

Merge branch 'main' into feat/explode-method

7be326e

Merge branch 'main' into feat/explode-method

5da1ad6

Merge branch 'main' into feat/explode-method

4a098b8

Merge branch 'feat/explode-method' of https://github.com/narwhals-dev…

380a6cb

…/narwhals into feat/explode-method

Merge branch 'main' into feat/explode-method

c7a47c9

better error message, fail for arrow with nulls

864e932

doctest-modules

cc72f6b

completely remove pyarrow implementation

1156beb

feat: ArrowDataFrame explode method

03081cb

merge main

8fc8c0a

dangotbanned mentioned this pull request Mar 17, 2025

chore: Spec CompliantLazyFrame #2232

Merged

12 tasks

dangotbanned added enhancement New feature or request pyarrow Issue is related to pyarrow backend labels Mar 25, 2025

dangotbanned added a commit that referenced this pull request Mar 25, 2025

feat: Add DataFrame.clone for pyarrow

310b080

Mentioned in #1644 (comment) #2207

dangotbanned mentioned this pull request Mar 25, 2025

feat: Add DataFrame.clone for pyarrow #2288

Merged

10 tasks

dangotbanned added 3 commits March 25, 2025 12:21

Merge remote-tracking branch 'upstream/main' into feat/pyarrow-explode

7369925

fix: remove not_implemented

d04fc7d

https://results.pre-commit.ci/run/github/760058710/1742905302.AsTci5pETIqquA1eJPcxNQ

refactor: move imports

fc79540

dangotbanned added 2 commits March 25, 2025 12:28

chore: use ArrowDataFrame.native

1f1ac63

fix(typing): Resolve most issues

79b8fd4

`.to_pylist` being called on a scalar is all that is left

dangotbanned and others added 3 commits March 25, 2025 13:41

pyright ignore

80fcc02

Just leaving as-is, since this'll probably change in the future #1644 (comment)

fix(typing): Avoid mypy redef

22ea311

> error: Incompatible redefinition (redefinition with type "Callable[[ChunkedArray[ListScalar[Any]]], ChunkedArray[Any]]", original type overloaded function) [misc] https://github.com/narwhals-dev/narwhals/actions/runs/14060304329/job/39369169923?pr=1644

Merge branch 'main' into feat/pyarrow-explode

8e1e025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: `ArrowDataFrame.explode` #1644

feat: `ArrowDataFrame.explode` #1644

Uh oh!

FBruzzesi commented Dec 22, 2024 •

edited

Loading

Uh oh!

dangotbanned commented Mar 25, 2025 •

edited

Loading

Uh oh!

FBruzzesi commented Mar 25, 2025 •

edited

Loading

Uh oh!

dangotbanned commented Mar 25, 2025

Uh oh!

dangotbanned commented Mar 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: ArrowDataFrame.explode #1644

Are you sure you want to change the base?

feat: ArrowDataFrame.explode #1644

Uh oh!

Conversation

FBruzzesi commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

dangotbanned commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Mar 25, 2025

Uh oh!

dangotbanned commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

feat: `ArrowDataFrame.explode` #1644

feat: `ArrowDataFrame.explode` #1644

FBruzzesi commented Dec 22, 2024 •

edited

Loading

dangotbanned commented Mar 25, 2025 •

edited

Loading

FBruzzesi commented Mar 25, 2025 •

edited

Loading

dangotbanned commented Mar 25, 2025 •

edited

Loading