When column is of type int, histogram acts different from pandas dataframe

When I summon a hist from a Pandas column (Series) containing integers I get a proper histogram where the x axis is divided to bins of value ranges.
When I do the same using a handy DataFrame I get a categorical histogram. 

I dug into the code and the reason for the way handy acts is that the column of integers is not defined as a member of the self._continuous group of columns. 

hist uses the continuous list as an indication of using categorical for non continuous. This is why a hist of integers in handy is not what one would expect from a hist of integers in Pandas.

a workaround is to cast the integer column to floats. I think this is a bug (couldn't find anything in the docs).

Here's a quick repro code..

```
pdf = pd.DataFrame({'bobo': np.random.randint(0, 100, 5000)})
df = spark.createDataFrame(pdf).withColumn('float_bobo', F.col('bobo').astype('float'))
hdf = df.toHandy()
pdf.bobo.hist()
hdf.cols['bobo'].hist()
hdf.cols['float_bobo'].hist()
```

I forgot to congratulate you on this great lib, it really is cool!

Itamar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When column is of type int, histogram acts different from pandas dataframe #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

When column is of type int, histogram acts different from pandas dataframe #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions