Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Passing original properties of DataFrame and Series subclasses to their constructors #61101

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

YankoFelipe
Copy link

@YankoFelipe YankoFelipe commented Mar 11, 2025

While implementing a subclass of Dataframe, I found that in some operations, objects that subclass's DataFrame and Series forget their original properties.

import pandas as pd

class SubclassedSeries(pd.Series):
    _metadata = ['original_property']
    def __init__(self, data=None, original_property=None, *args, **kwargs):
        super().__init__(data, *args, **kwargs)
        self.original_property = original_property
    @property
    def _constructor(self):
        return SubclassedSeries
    @property
    def _constructor_expanddim(self):
        return SubclassedDataFrame


class SubclassedDataFrame(pd.DataFrame):
    _metadata = ['original_property']
    def __init__(self, data=None, original_property=None, *args, **kwargs):
        super().__init__(data, *args, **kwargs)
        self.original_property = original_property
    @property
    def _constructor(self):
        return SubclassedDataFrame
    @property
    def _constructor_sliced(self):
        return SubclassedSeries

## __init__
df = SubclassedDataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]}, original_property='original_property')
print(f'__init__: {df.original_property}')

## Select
select_df = df[df['value'] == '1']
print(f'Select: {df.original_property}')

## loc
loc_df = df.loc[df['lkey'] == 'foo']
print(f'loc: {loc_df.original_property}')

## Concat
df = pd.concat([df, df])
print(f'Concat: {df.original_property}')

## Merge
df1 = SubclassedDataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                   'value': [1, 2, 3, 5]}, original_property='original_property')
df2 = SubclassedDataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                   'value': [5, 6, 7, 8]}, original_property='original_property')
merged_df = df1.merge(df2, left_on='lkey', right_on='rkey')
print(f'Merge: {merged_df.original_property}')

## Series
series = df['value']
print(f'Series: {series.original_property}')

## Sum
sum_df = df1 + df2
print(f'Sum: {sum_df.original_property}')

(df * 2).added_property

This functionality is critical for my project so I decided to fix it. I had the following considerations:

My environment consist on SUSE 15.6 and Python 3.11.11

Edit: Removed WIP comments about tests and code checks.

@YankoFelipe YankoFelipe requested a review from noatamir as a code owner March 11, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Subclassed DataFrame doesn't persist _metadata properties across binary operations
1 participant