Skip to content

NumPy vectorized function can not be dilled after using them #487

@arielshulman

Description

@arielshulman

I had encounter with a strange behavior when I used scikit-learn pipelines with np.vectorize function and pickle it using dill.

I've managed to narrow the situation to this -
When I try to pickle a simple function that have been vectorized with non-regular otype such object or str, if it occurs before running it, dill works fine but if I've used it once, the dill yells with PicklingError.

For example -

import numpy as np
import dill

def f(x):
    return x
vf = np.vectorize(f,otypes=[object])

arr = np.asarray(["a","b","c"])

dill.detect.trace(True)

dill.dumps(vf)

out = vf(arr)
print(out)

dill.dumps(vf)

and the output is -

T4: <class 'numpy.vectorize'>
# T4
D2: <dict object at 0x000001BB56AF1F00>
F1: <function f at 0x000001BB56E5A430>
F2: <function _create_function at 0x000001BB67356430>
# F2
Co: <code object f at 0x000001BB56B659D0, file "C:\Users\USER\dill_vect.py", line 4>
F2: <function _create_code at 0x000001BB673564C0>
# F2
# Co
D1: <dict object at 0x000001BB56AF1CC0>
# D1
D2: <dict object at 0x000001BB67255E80>
# D2
# F1
D2: <dict object at 0x000001BB6711CB80>
# D2
# D2
['a' 'b' 'c']
T4: <class 'numpy.vectorize'>
# T4
D2: <dict object at 0x000001BB56AF1F00>
F1: <function f at 0x000001BB56E5A430>
F2: <function _create_function at 0x000001BB67356430>
# F2
Co: <code object f at 0x000001BB56B659D0, file "C:\Users\USER\dill_vect.py", line 4>
F2: <function _create_code at 0x000001BB673564C0>
# F2
# Co
D1: <dict object at 0x000001BB56AF1CC0>
# D1
D2: <dict object at 0x000001BB67255E80>
# D2
# F1
D2: <dict object at 0x000001BB6711CB80>
Traceback (most recent call last):
  File "C:\Users\USER\dill_vect.py", line 17, in <module>
    dill.dumps(vf)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 304, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 276, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 487, in dump
    self.save(obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 1002, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 589, in save
    self.save_global(obj, rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 1070, in save_global
    raise PicklingError(
_pickle.PicklingError: Can't pickle <ufunc 'f (vectorized)'>: it's not found as __main__.f (vectorized)

This test ran on Windows, but I've tested it on Linux as well and the same problem occurs.
Packages versions used for the test -
numpy==1.22.4 and dill==0.3.5.1 and also with dill==0.3.4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions