Skip to content

Better error message for DataFrame.apply #153

Open
@datapythonista

Description

@datapythonista

The next example seems reasonable to me:

df = pandas.DataFrame({'col1': ['1', '2', '3'],
                       'col2': ['9', '9', '9']})
df.apply(int)

And looks like it should convert the data in the DataFrame to integers, by calling the int() function for every element.

This would be true for Series.apply, but DataFrame.apply parameter is a function that receives a whole Series at a time, not individual (scalar) values. The function that receives one value at a time is DataFrame.applymap.

This is how pandas is designed, and while probably a bit confusing is reasonable. So, the previous example actually fails. The error is:

TypeError: ("cannot convert the series to <class 'int'>", 'occurred at index col1')

Feel free to disagree, but personally I think the error message doesn't do a great job at telling the user what's wrong, or give hints on how to fix it. I think something like the next should be more useful:

TypeError: The function `int` passed to `DataFrame.apply` should expect a `Series` as the argument. To apply a function that receives a single item at a time use `DataFrame.applymap`.

While this may look straight-forward, this is easy and surely not as easy as replacing the error message. The current reported message is reported by the Series when is trying to be converted to an integer by int(pandas.Series()), so it has nothing to do with apply.

I think it's doable to have an appropriate error message, but not sure about the implications.

Feel free to discuss your proposals on how to fix it here, or to try your approach and open a PR, and have the discussion there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions