Skip to content

[Bug]: Multiprocessing fails on macOS/Windows (and future Linux) because spawn requires passing classes between processes #305

@huaxig

Description

@huaxig

What happened

The current implementation of the multiprocessing fails when running with multiple workers on macOS and Windows. This is because these platforms use the spawn start method by default for multiprocessing. you may see some exceptions like

TypeError: cannot pickle 'generator' object

Root Cause

The issue is not just about specific unpicklable objects, but fundamentally that the spawn method (unlike fork) creates a fresh interpreter for the child process. This requires that the entire class instance and its state be serialized (pickled) and passed to the new process.

Currently, our data classes are not designed to be passed between processes in this way. When spawn attempts to transfer the class to the worker process, it fails because the state (including generators like itertools.cycle or streaming datasets) cannot be serialized.

Future Compatibility (Python 3.14+):

This issue is critical for long-term support because Python is moving away from fork. The fork method is already discouraged in Python 3.12+ and is expected to be removed as the default on Linux in Python 3.14. This means this breakage will eventually affect Linux users as well.

ref: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Proposed Solution (Best Practice)

We should support cross-platform multiprocessing by explicitly pass those class based on recommendation in python's doc

Action

  • Refactor the code to ensure all data classes and context objects are fully pickleable.
  • Explicitly pass context objects to worker processes rather than relying on fork's shared memory.
  • Explicitly ensure all platforms use spawn

Mitigation Plan

Until the proper cross-platform support is implemented

  • macOS / Linux (Temporary): If breakage occurs on newer Python versions, the mitigation is to roll back to Python 3.12 or explicitly enforce the fork start method. eg.
     import multiprocessing as mp
    
     if __name__ == '__main__':
         mp.set_start_method('fork', force=True)
    
  • Windows: There is currently no mitigation plan for Windows (Win32) as it does not support fork.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions