Skip to content

Allow save_result_array(s) methods to use h5 dataset compression filters #44

Open
@philipstarkey

Description

@philipstarkey

Original report (archived issue) by David Meyer (Bitbucket: dihm, GitHub: dihm).


We've recently had an experiment get up to speed and start producing a prodigious amount of data. In researching how to deal with it, I stumbled on to hdf5 dataset compression. It gives a performance hit to read/writes for moderate size reductions while being essentially transparent in use, so long as the compression is specified when the dataset is first created.

It would be nice if large, summary datasets produced in lyse could be stored compressed from the outset. Because of performance considerations, we don't want it to be default, so it should be configurable on individual data saves as well.

My initial thought was to simply pass through kwargs to the create_dataset function in both save_result_arrays and save_result_array. Thoughts?

#!diff

@@ -209,7 +209,8 @@
                 _updated_data[self.h5_path] = {}
             _updated_data[self.h5_path][str(self.group), name] = value
 
-    def save_result_array(self, name, data, group=None, overwrite=True, keep_attrs=False):
+    def save_result_array(self, name, data, group=None, 
+                          overwrite=True, keep_attrs=False, **compress_args):
         if self.no_write:
             raise Exception('This run is read-only. '
                             'You can\'t save results to runs through a '
@@ -233,7 +234,7 @@
                 else:
                     raise Exception('Dataset %s exists. Use overwrite=True to overwrite.' % 
                                      group + '/' + name)
-            h5_file[group].create_dataset(name, data=data)
+            h5_file[group].create_dataset(name, data=data, **compress_args)
             for key, val in attrs.items():
                 h5_file[group][name].attrs[key] = val
 
@@ -264,7 +265,7 @@
                 self.save_result(name, value[0], **kwargs)
                 self.save_result('u_' + name, value[1], **kwargs)
 
-    def save_result_arrays(self, *args):
+    def save_result_arrays(self, *args, **compress_args):
         names = args[::2]
         values = args[1::2]
         for name, value in zip(names, values):
@@ -268,7 +269,7 @@
         names = args[::2]
         values = args[1::2]
         for name, value in zip(names, values):
-            self.save_result_array(name, value)
+            self.save_result_array(name, value, **compress_args)
     
     def get_image(self,orientation,label,image):
         with h5py.File(self.h5_path) as h5_file:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions