Skip to content

Faulty histogram computation? #1757

Open
@daso94msg

Description

@daso94msg

Hello,

There are two things that I think are wrong in ydata_profiling/model/summary_algorithms.histogram_compute:

if len(bins) > hist_config.max_bins:
    bins = np.histogram_bin_edges(finite_values, bins=hist_config.max_bins)  
    weights = weights if weights and len(weights) == hist_config.max_bins else None
  1. I think it needs to be len(bins) > hist_config.max_bins +1 , as np.histogram_bin_edges includes the rightmost edge, so that its return value will always be number of bins +1.
  2. Why are the weights set to None if len(weights) != hist_config.max_bins? The shape of the weights should corespond to the values and will almost never be the same shape as the bins. I can't really think of a reasoning for this check at all.

@alexbarros: I see that you made that commit initially, could you elaborate?

Regards

David

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions