Skip to content

Problem with mutate and delay functions #85

@kieferk

Description

@kieferk

Hi all,

I've noticed an issue with mutate when you define variables using delay functions and group_by. I think the problem is actually just with mutating not working properly with group_by but I haven't extensively tested. For example:

@DelayFunction
def lead(series, i=1):
    index = series.index
    shifted = series.shift(i)
    shifted.index = index
    return shifted

diamonds >> group_by(X.cut) >> mutate(price_lead = lead(X.price)) >> head(6)

   Unnamed: 0  carat        cut color clarity  depth  table  price     x  \
0           1   0.23      Ideal     E     SI2   61.5   55.0    326  3.95   
1           2   0.21    Premium     E     SI1   59.8   61.0    326  3.89   
2           3   0.23       Good     E     VS1   56.9   65.0    327  4.05   
3           4   0.29    Premium     I     VS2   62.4   58.0    334  4.20   
4           5   0.31       Good     J     SI2   63.3   58.0    335  4.34   
5           6   0.24  Very Good     J    VVS2   62.8   57.0    336  3.94   

      y     z  price_lead  
0  3.98  2.43         NaN  
1  3.84  2.31       326.0  
2  4.07  2.31       326.0  
3  4.23  2.63       327.0  
4  4.35  2.75       334.0  
5  3.96  2.48       335.0 

The lead delay function should operate independently on each group, but instead it is operating on the entire dataframe regardless of group.

I solved this in my own fork of dplython by removing mutate from the handled classes in the DplyFrame class. I assume however that you put it in handled classes for a reason, so I don't consider this a great fix (for example, arrange broke due to this and I had to change it to work again).

Curious to hear your opinion on this.

P.S. There are tons of changes and additions in that personal fork that I should make pull requests for, but a lot has changed including the formatting and so I've been lazy about it...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions