Documentation for Directly Writing to HDF5

Hi, long time lurker and big fan of blosc! Apologies if this is not the right place to raise this issue.   I've been working on trying push the saving speed for hdf5 files. From my understanding when trying to optimize for speed it is best to bypass the `filter` implementation and handle the compression/ saving separately.  

i.e:

https://www.blosc.org/posts/pytables-direct-chunking/
https://github.com/imaris/ImarisWriter

My impression was that this was largely done using the `H5Dwrite_chunk ` function in the hdf5 library which allows you to directly write to some chunk bypassing the filter implementation. 

Doing this seems fairly straight forward.  I've tried starting with the easier serial case: 

1. Compress some data using `blosc` 
2. Write the compressed data directly to a hdf5 dataset using the `H5Dwrite_chunk ` 

Currently data is written to the HDF5 dataset but reading it with something like hdf5plugin doesn't seem to work and a `blosc decompression error` is thrown.  My first thought is that this is related to the filter parameters, is there documentation on what they __should__ be? Currently I have something like:

```c++
cd_values[0] = 2;
cd_values[1] = 2; 
cd_values[2] = m_hdfImagesDataType.getSize(); // 2 for 16 bit int

/* Get the size of the chunk */
int bufsize = m_hdfImagesDataType.getSize();
for (int i = 0; i <  3; i++) {
	bufsize *= (unsigned int)m_hdfImagesChunkDimensions[i];
}
m_compressionLibrary = blosc1_set_compressor("blosclz");

cd_values[3] = bufsize;
cd_values[4] = m_clevel;               /* compression level */
cd_values[5] = m_bitshuffle;               /* 0: shuffle not active, 1: shuffle active */
cd_values[6] = m_compressionLibrary;      /* the actual compressor to use */


// Create the HDF5 data space and data set for images
m_hdfImagesDataSpace = H5::DataSpace(m_hdfImagesDataSpaceDimensionsCount, m_hdfImagesDataSpaceDimensions, NULL);
H5::DSetCreatPropList hdfImagesDataSetProperties; 
hdfImagesDataSetProperties.setChunk(m_hdfImagesDataSpaceDimensionsCount, m_hdfImagesChunkDimensions);
// 32001 for blosc1
hdfImagesDataSetProperties.setFilter(32001,H5Z_FLAG_OPTIONAL, 7, cd_values);

m_hdfImagesDataSet = m_hdfImagesGroup.createDataSet("patterns", m_hdfImagesDataType, m_hdfImagesDataSpace, hdfImagesDataSetProperties);

....

int compressed_size = bytesPerChunkUncompressed+ BLOSC2_MAX_OVERHEAD;
int blosc_result = blosc1_compress(m_clevel, m_bitshuffle, type_bytes, bytesPerChunkUncompressed, (char*)imagePixelData, compressed_data, compressed_size);  
if (blosc_result <= 0) {
	delete[] compressed_data; // Clean up
	return false;
}
hsize_t hdfDataSpaceNewImageDimensions[3] = {static_cast<hsize_t>(framesInBuffer), static_cast<hsize_t>(m_imageHeight), static_cast<hsize_t>(m_imageWidth)};
// Get the offset Chunk for the data... 
hsize_t hdfOffset[3] = { static_cast<hsize_t>(framesInBuffer* m_outputImageCurrentCount), 0, 0};

compressed_size = blosc_result; // Update to actual compressed size
H5Dwrite_chunk(m_hdfImagesDataSet.getId(), H5P_DEFAULT, 0, hdfOffset, compressed_size, compressed_data); // filters = 0 --> no filter applied?
```

I was wondering if there is a minimal example of this kind of workflow somewhere? If not it would be great to add this to the documentation in some way. I can help with that :) assuming that I can figure out exactly how to save the data in a way that is readable. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Documentation for Directly Writing to HDF5 #651

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Documentation for Directly Writing to HDF5 #651

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions