Skip to content

Commit 0083ab2

Browse files
authored
Optimize min_count when expected_groups is not provided. (#236)
* Optimize min_count for all numpy For pure numpy arrays, min_count=1 (xarray default) is the same as min_count=None, with the right fill_value. This avoids one useless pass over the data, and one useless copy. We need to always accumulate count with dask, to make sure we get the right values at the end. * Better?
1 parent c398f4e commit 0083ab2

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

flox/core.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -2324,6 +2324,7 @@ def groupby_reduce(
23242324
nby = len(bys)
23252325
by_is_dask = tuple(is_duck_dask_array(b) for b in bys)
23262326
any_by_dask = any(by_is_dask)
2327+
provided_expected = expected_groups is not None
23272328

23282329
if (
23292330
engine == "numbagg"
@@ -2440,7 +2441,7 @@ def groupby_reduce(
24402441
# The only way to do this consistently is mask out using min_count
24412442
# Consider np.sum([np.nan]) = np.nan, np.nansum([np.nan]) = 0
24422443
if min_count is None:
2443-
if nax < by_.ndim or fill_value is not None:
2444+
if nax < by_.ndim or (fill_value is not None and provided_expected):
24442445
min_count_: int = 1
24452446
else:
24462447
min_count_ = 0

0 commit comments

Comments
 (0)