fix: Preserve pandas column name attribute #2363

FBruzzesi · 2025-04-08T21:35:41Z

What type of PR is this? (check all applicable)

Related issues

Closes bug: select and with_columns don't always preserve column index name for pandas-like #1483

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

FBruzzesi · 2025-04-08T21:36:13Z

narwhals/_dask/dataframe.py

@@ -110,7 +110,7 @@ def collect(
            from narwhals._pandas_like.dataframe import PandasLikeDataFrame

            return PandasLikeDataFrame(
-                result,
+                result.rename_axis(columns=self.native.columns.name),


This might be one step ahead of dask#11874

narwhals/_pandas_like/dataframe.py

narwhals/_pandas_like/namespace.py

FBruzzesi · 2025-04-08T21:39:59Z

narwhals/translate.py

-                backend_version=parse_version(pd),
                implementation=Implementation.PANDAS,
+                backend_version=parse_version(pd),


Mental sanity for symmetry with other pandas-like and order of the spec 😅

FBruzzesi · 2025-04-08T21:40:54Z

tests/preserve_pandas_like_columns_name_attr_test.py

+
+    df = nw.from_native(df_native)
+
+    result = df.with_columns(b=nw.col("a") + 1, c=nw.col("a") * 2).select("c", "b")


I wanted to concatenate two methods here, mostly for the sake of it.
Might be worth reparametrizing it

FBruzzesi · 2025-04-08T21:41:26Z

narwhals/_dask/dataframe.py

Dask select and with_columns have no issues since we use .assign method for both!

MarcoGorelli · 2025-04-09T14:34:38Z

thanks! i'll do a little test to check there's no accidental / surprise overhead

FBruzzesi · 2025-04-10T14:40:55Z

@MarcoGorelli I tried a simple script to check rename_axis overhead (not in narwhals), and it seems to be a pretty low overhead operation:

import timeit
from pprint import pprint

import numpy as np
import pandas as pd


def benchmark_rename_axis_overhead(n_rows: int, n_cols: int, repetitions: int = 1000) -> float:
    df = pd.DataFrame(np.random.randn(n_rows, n_cols))

    return timeit.timeit(
        lambda: df.rename_axis(columns="new_name", copy=False),
        number=repetitions
    ) / repetitions


TOTAL_SIZE = 10_000_000

results = {
    (int(TOTAL_SIZE/10**i), 10**i): benchmark_rename_axis_overhead(n_rows=int(TOTAL_SIZE/10**i), n_cols=10**i)
    for i in range(1, 6)
}

pprint(results)

{(100, 100000): 1.2109115000015436e-05,
 (1000, 10000): 1.679945200001498e-05,
 (10000, 1000): 1.2669480999988991e-05,
 (100000, 100): 1.331965500000365e-05,
 (1000000, 10): 2.9313745999985485e-05}

Let me know what your thoughts are and if we need to check something more extensively

narwhals/_dask/dataframe.py

Single argument + fake typing 🎉

narwhals/_pandas_like/utils.py

Moved to #2368

FBruzzesi · 2025-04-17T18:07:01Z

~~Messed up the merge 🥲~~
Fixed

MarcoGorelli · 2025-04-20T10:07:28Z

thanks!

Let me know what your thoughts are and if we need to check something more extensively

i'm still slightly concerned about there being extra copies in old versions, i'll take another look

FBruzzesi · 2025-04-20T10:18:08Z

thanks!

Let me know what your thoughts are and if we need to check something more extensively

i'm still slightly concerned about there being extra copies in old versions, i'll take another look

No rush! I keep merging main to avoid having to deal with a large conflicting diff if that happens to be the case

MarcoGorelli · 2025-05-03T18:13:23Z

narwhals/_pandas_like/dataframe.py

        return self.__class__(
-            df,
+            rename_axis(
+                df,
+                implementation=self._implementation,
+                backend_version=self._backend_version,
+                columns=self._native_columns_name,
+            ),


🤔 i'm wondering if we could/should just set df.columns.name = self._native_columns_name

else we need to trust pandas' copy-on-write / copy=False working properly, which i'm not sure i do

I trust your judgement here - but should I would be concerned as a pandas user?

…-column-index-name

MarcoGorelli

thanks @FBruzzesi !

I don't trust pandas' rename_axis, nor do I trust that copy-on-write will take effect properly, we would need more comprehensive benchmarks to be sure

but, in select and with_columns, we create new objects, so it should be safe to just use .columns.name = ... there

FBruzzesi added 2 commits April 8, 2025 18:28

concat

feaad95

seems to work

f28599f

FBruzzesi commented Apr 8, 2025

View reviewed changes

narwhals/_pandas_like/dataframe.py Outdated Show resolved Hide resolved

FBruzzesi commented Apr 8, 2025

View reviewed changes

narwhals/_pandas_like/namespace.py Outdated Show resolved Hide resolved

FBruzzesi commented Apr 8, 2025

View reviewed changes

narwhals/_dask/dataframe.py Outdated

Copy link

Member Author

FBruzzesi Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dask select and with_columns have no issues since we use .assign method for both!

FBruzzesi added the fix label Apr 8, 2025

FBruzzesi added 4 commits April 8, 2025 23:48

copy=False

fc982d3

ok pandas 3

f557dd5

Merge branch 'main' into fix/maintain-pandas-column-index-name

46c29e7

do not HACK non-pandas codepath

d5fea77

FBruzzesi added the pandas-like Issue is related to pandas-like backend label Apr 9, 2025

mypy

f3b4cf9

Merge branch 'main' into fix/maintain-pandas-column-index-name

9f3c415

dangotbanned mentioned this pull request Apr 10, 2025

[Enh]: Better benchmarking routine #805

Open

dangotbanned reviewed Apr 10, 2025

View reviewed changes

narwhals/_dask/dataframe.py Outdated Show resolved Hide resolved

refactor: utils.horizontal_concat -> Namespace._horizontal_concat

bd7fd09

Single argument + fake typing 🎉

dangotbanned reviewed Apr 10, 2025

View reviewed changes

narwhals/_pandas_like/utils.py Outdated Show resolved Hide resolved

dangotbanned mentioned this pull request Apr 10, 2025

refactor: Simplify PandasLikeNamespace.concat #2368

Merged

13 tasks

dangotbanned and others added 3 commits April 10, 2025 18:42

revert: undo last commit

987a688

Moved to #2368

re-solve conflicts

7f39ada

re-solve conflicts

0d105a3

FBruzzesi marked this pull request as draft April 17, 2025 18:06

fix pandas utils

13166e7

FBruzzesi marked this pull request as ready for review April 17, 2025 18:10

FBruzzesi added 3 commits April 18, 2025 16:24

Merge branch 'main' into fix/maintain-pandas-column-index-name

afaa9ec

Merge branch 'main' into fix/maintain-pandas-column-index-name

2cf041c

Merge branch 'main' into fix/maintain-pandas-column-index-name

984a57b

FBruzzesi and others added 4 commits April 21, 2025 16:08

Merge branch 'main' into fix/maintain-pandas-column-index-name

823293b

Merge branch 'main' into fix/maintain-pandas-column-index-name

788bc1f

merge main

1e745d4

Merge branch 'main' into fix/maintain-pandas-column-index-name

e1ef8b5

MarcoGorelli reviewed May 3, 2025

View reviewed changes

MarcoGorelli added 5 commits May 16, 2025 16:18

Merge remote-tracking branch 'upstream/main' into fix/maintain-pandas…

28795f2

…-column-index-name

keep it simpler, dont trust pandas for rename_axis

5efaff2

remove unintended file

f202c57

reduce diff

e024c54

reduce diff

5387649

MarcoGorelli approved these changes May 16, 2025

View reviewed changes

FBruzzesi merged commit faa3e9d into main May 16, 2025
32 checks passed

FBruzzesi deleted the fix/maintain-pandas-column-index-name branch May 16, 2025 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Preserve pandas column name attribute #2363

fix: Preserve pandas column name attribute #2363

Uh oh!

FBruzzesi commented Apr 8, 2025

Uh oh!

FBruzzesi Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

FBruzzesi Apr 8, 2025

Uh oh!

FBruzzesi Apr 8, 2025

Uh oh!

FBruzzesi Apr 8, 2025

Uh oh!

MarcoGorelli commented Apr 9, 2025

Uh oh!

FBruzzesi commented Apr 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

FBruzzesi commented Apr 17, 2025 •

edited

Loading

Uh oh!

MarcoGorelli commented Apr 20, 2025

Uh oh!

FBruzzesi commented Apr 20, 2025

Uh oh!

MarcoGorelli May 3, 2025

Uh oh!

FBruzzesi May 4, 2025

Uh oh!

MarcoGorelli left a comment

Uh oh!

Uh oh!

Uh oh!


		df = nw.from_native(df_native)

		result = df.with_columns(b=nw.col("a") + 1, c=nw.col("a") * 2).select("c", "b")

fix: Preserve pandas column name attribute #2363

fix: Preserve pandas column name attribute #2363

Uh oh!

Conversation

FBruzzesi commented Apr 8, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

Uh oh!

FBruzzesi Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

FBruzzesi Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Apr 9, 2025

Uh oh!

FBruzzesi commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FBruzzesi commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Apr 20, 2025

Uh oh!

FBruzzesi commented Apr 20, 2025

Uh oh!

MarcoGorelli May 3, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi May 4, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

FBruzzesi commented Apr 10, 2025 •

edited

Loading

FBruzzesi commented Apr 17, 2025 •

edited

Loading