Skip to content

Crashes when dealing with large datasets #34

Open
@juliaroquette

Description

@juliaroquette

I am trying to use deepdish to store/restore large datasets in the HDF5 format, but deepdish.io.save crashes every time the dataset is larger than about 2GB.

For example, suppose we have a very large array:
t=bytearray(8*1000*1000*400)
when I try:
dd.io.save('testeDeepdishLimit',t)
I get the error:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-3-26ecd71b151a> in <module>()
----> 1 dd.io.save('testeDeepdishLimit',t)

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in save(path, data, compression)
    594         else:
    595             _save_level(h5file, group, data, name='data',
--> 596                         filters=filters, idtable=idtable)
    597             # Mark this to automatically unpack when loaded
    598             group._v_attrs[DEEPDISH_IO_UNPACK] = True

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
    302 
    303     else:
--> 304         _save_pickled(handler, group, level, name=name)
    305 
    306 

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in _save_pickled(handler, group, level, name)
    170                   DeprecationWarning)
    171     node = handler.create_vlarray(group, name, tables.ObjectAtom())
--> 172     node.append(level)
    173 
    174 

~/anaconda3/lib/python3.6/site-packages/tables/vlarray.py in append(self, sequence)
    535             nparr = None
    536 
--> 537         self._append(nparr, nobjects)
    538         self.nrows += 1
    539 

tables/hdf5extension.pyx in tables.hdf5extension.VLArray._append()

OverflowError: value too large to convert to int

Is there any workaround for this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions