Skip to content

Documentation for Directly Writing to HDF5 #651

@CSSFrancis

Description

@CSSFrancis

Hi, long time lurker and big fan of blosc! Apologies if this is not the right place to raise this issue. I've been working on trying push the saving speed for hdf5 files. From my understanding when trying to optimize for speed it is best to bypass the filter implementation and handle the compression/ saving separately.

i.e:

https://www.blosc.org/posts/pytables-direct-chunking/
https://github.com/imaris/ImarisWriter

My impression was that this was largely done using the H5Dwrite_chunk function in the hdf5 library which allows you to directly write to some chunk bypassing the filter implementation.

Doing this seems fairly straight forward. I've tried starting with the easier serial case:

  1. Compress some data using blosc
  2. Write the compressed data directly to a hdf5 dataset using the H5Dwrite_chunk

Currently data is written to the HDF5 dataset but reading it with something like hdf5plugin doesn't seem to work and a blosc decompression error is thrown. My first thought is that this is related to the filter parameters, is there documentation on what they should be? Currently I have something like:

cd_values[0] = 2;
cd_values[1] = 2; 
cd_values[2] = m_hdfImagesDataType.getSize(); // 2 for 16 bit int

/* Get the size of the chunk */
int bufsize = m_hdfImagesDataType.getSize();
for (int i = 0; i <  3; i++) {
	bufsize *= (unsigned int)m_hdfImagesChunkDimensions[i];
}
m_compressionLibrary = blosc1_set_compressor("blosclz");

cd_values[3] = bufsize;
cd_values[4] = m_clevel;               /* compression level */
cd_values[5] = m_bitshuffle;               /* 0: shuffle not active, 1: shuffle active */
cd_values[6] = m_compressionLibrary;      /* the actual compressor to use */


// Create the HDF5 data space and data set for images
m_hdfImagesDataSpace = H5::DataSpace(m_hdfImagesDataSpaceDimensionsCount, m_hdfImagesDataSpaceDimensions, NULL);
H5::DSetCreatPropList hdfImagesDataSetProperties; 
hdfImagesDataSetProperties.setChunk(m_hdfImagesDataSpaceDimensionsCount, m_hdfImagesChunkDimensions);
// 32001 for blosc1
hdfImagesDataSetProperties.setFilter(32001,H5Z_FLAG_OPTIONAL, 7, cd_values);

m_hdfImagesDataSet = m_hdfImagesGroup.createDataSet("patterns", m_hdfImagesDataType, m_hdfImagesDataSpace, hdfImagesDataSetProperties);

....

int compressed_size = bytesPerChunkUncompressed+ BLOSC2_MAX_OVERHEAD;
int blosc_result = blosc1_compress(m_clevel, m_bitshuffle, type_bytes, bytesPerChunkUncompressed, (char*)imagePixelData, compressed_data, compressed_size);  
if (blosc_result <= 0) {
	delete[] compressed_data; // Clean up
	return false;
}
hsize_t hdfDataSpaceNewImageDimensions[3] = {static_cast<hsize_t>(framesInBuffer), static_cast<hsize_t>(m_imageHeight), static_cast<hsize_t>(m_imageWidth)};
// Get the offset Chunk for the data... 
hsize_t hdfOffset[3] = { static_cast<hsize_t>(framesInBuffer* m_outputImageCurrentCount), 0, 0};

compressed_size = blosc_result; // Update to actual compressed size
H5Dwrite_chunk(m_hdfImagesDataSet.getId(), H5P_DEFAULT, 0, hdfOffset, compressed_size, compressed_data); // filters = 0 --> no filter applied?

I was wondering if there is a minimal example of this kind of workflow somewhere? If not it would be great to add this to the documentation in some way. I can help with that :) assuming that I can figure out exactly how to save the data in a way that is readable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions