Skip to content

FIX keep inherited from builtin containers' types#9298

Open
adrinjalali wants to merge 2 commits into
dask:mainfrom
adrinjalali:fix-scatter-collection-subclass
Open

FIX keep inherited from builtin containers' types#9298
adrinjalali wants to merge 2 commits into
dask:mainfrom
adrinjalali:fix-scatter-collection-subclass

Conversation

@adrinjalali

Copy link
Copy Markdown

This turned up in scikit-learn, where a Bunch object would get downgraded to a dict, when the dask backend for joblib was used. We fixed the issue in scikit-learn by not using the Bunch specific features, but the core issue comes from here, and this PR fixes the issue.

I think it would make sense to fix the core issue here.

  • Tests added / passed
  • Passes pixi run lint

fyi @tomMoral @StefanieSenger


Claude generated summary, feel free to ignore.

Preserve collection subclasses across Client.scatter

Client.scatter decided whether to treat its argument as a single value or a
collection of items to scatter using isinstance. Because
isinstance(some_dict_subclass, dict) is True, a dict subclass (e.g.
scikit-learn's Bunch) was unpacked as a mapping and returned as {key: Future},
silently losing its type — list/set/tuple subclasses were reconstructed via
input_type(...), but dict subclasses were not.

This switches the collection-vs-single decision to exact-type checks, so any
subclass of a builtin collection is scattered as a single opaque value (pickled,
exact type preserved). Exact dict/list/set/tuple/frozenset keep their
existing "collection of items" behavior.

Background

scikit-learn's metadata routing passes a Bunch (a dict subclass with
attribute access) between estimators. Under joblib's dask backend, arguments
larger than ~1 KB are implicitly scattered, so the Bunch round-tripped through
scatter and arrived on the worker as a plain dict, breaking attribute access
(AttributeError: 'dict' object has no attribute ...).

This was first worked around on the joblib side
(joblib/joblib#1798) by wrapping
such objects before scattering. Review feedback there asked, reasonably, why this
isn't fixed in dask directly — the downgrade happens because scatter chooses to
unpack the object, so scatter should defend its own contract. This PR is that
fix; the joblib-side workaround can be dropped once it lands.

Non-regression for
scikit-learn/scikit-learn#34005.

Tests

  • test_scatter_collection_subclass: a dict subclass, list/set subclasses, and a
    namedtuple each scatter to a single Future with their exact type preserved on
    the worker (including attribute access).
  • Existing scatter/gather behavior for exact builtin collections is unchanged.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    40 files  ± 0      40 suites  ±0   14h 21m 27s ⏱️ + 6m 0s
 4 154 tests + 1   3 973 ✅  -  2    178 💤 ±0  3 ❌ +3 
80 839 runs  +19  76 601 ✅ +18  4 235 💤  - 2  3 ❌ +3 

For more details on these failures, see this check.

Results for commit eb2f68a. ± Comparison against base commit db3cb43.

♻️ This comment has been updated with latest results.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adrinjalali

Copy link
Copy Markdown
Author

It doesn't look like the failures are related to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant