Description
The next example seems reasonable to me:
df = pandas.DataFrame({'col1': ['1', '2', '3'],
'col2': ['9', '9', '9']})
df.apply(int)
And looks like it should convert the data in the DataFrame to integers, by calling the int()
function for every element.
This would be true for Series.apply
, but DataFrame.apply
parameter is a function that receives a whole Series
at a time, not individual (scalar) values. The function that receives one value at a time is DataFrame.applymap
.
This is how pandas is designed, and while probably a bit confusing is reasonable. So, the previous example actually fails. The error is:
TypeError: ("cannot convert the series to <class 'int'>", 'occurred at index col1')
Feel free to disagree, but personally I think the error message doesn't do a great job at telling the user what's wrong, or give hints on how to fix it. I think something like the next should be more useful:
TypeError: The function `int` passed to `DataFrame.apply` should expect a `Series` as the argument. To apply a function that receives a single item at a time use `DataFrame.applymap`.
While this may look straight-forward, this is easy and surely not as easy as replacing the error message. The current reported message is reported by the Series
when is trying to be converted to an integer by int(pandas.Series())
, so it has nothing to do with apply
.
I think it's doable to have an appropriate error message, but not sure about the implications.
Feel free to discuss your proposals on how to fix it here, or to try your approach and open a PR, and have the discussion there.