Skip to content

HDF5 crashes with inefficient compressors #108

Open
@lucasvr

Description

@lucasvr

I noticed that HDF5 crashes when I/O filters produce more data than the original dataset size.

When a dataset is created, its declared dimensions + data type are naturally honored when it comes the time to write the data with H5Dwrite. The I/O filter interface, however, allows a compressor to either return a number that’s smaller than that (in which case it successfully compressed the data) or slightly larger (in which case the compressor didn’t do a good job).

Now, let’s say we have a really bad compressor which requires 100x more room than necessary. What I observe is that HDF5 seems to truncate the data, so it’s not possible to retrieve it afterwards. In some cases, HDF5 even crashes when the dataset handle is closed.

Here’s an example I/O filter that reproduces the problem.

// build with 'g++ liberror.cpp -C -o libtestcrash.so -shared -fPIC -Wall -g -ggdb'
#include <hdf5.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>

extern "C" {

size_t callback(unsigned int flags, size_t cd_nelmts, const unsigned int *cd_values, size_t nbytes, size_t *buf_size, void **buf)
{
    if (flags & H5Z_FLAG_REVERSE) {
        return *buf_size;
    } else {
        char *newbuf = (char *) calloc(1000*1000, sizeof(char));
        free(*buf);
        *buf = newbuf;
        *buf_size = 1000*1000;
        return *buf_size;
    }
}

const H5Z_class2_t H5Z_UDF_FILTER[1] = {{
    H5Z_CLASS_T_VERS, 0x2112, 1, 1, "crash_filter", NULL, NULL, callback,
}};

H5PL_type_t H5PLget_plugin_type(void) { return H5PL_TYPE_FILTER; }
const void *H5PLget_plugin_info(void) { return H5Z_UDF_FILTER; }
}

The corresponding application code is here:

// build with 'g++ mainerror.cpp -o mainerror -g -ggdb -Wall -lhdf5'
// run with 'HDF5_PLUGIN_PATH=$PWD ./mainerror file.h5'
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <hdf5.h>

#define CHECK(hid) if ((hid) < 0) { fprintf(stderr, "failed @line %d\n", __LINE__); exit(1); }

int main(int argc, char **argv)
{
    if (argc != 2) {
        printf("Syntax: %s <file.h5>\n", argv[0]);
        exit(1);
    }
    hsize_t dims[2] = {10, 10};
    hid_t file_id = H5Fopen(argv[1], H5F_ACC_RDWR, H5P_DEFAULT);
    CHECK(file_id);
    hid_t space_id = H5Screate_simple(2, dims, NULL);
    CHECK(space_id);
    hid_t dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
    CHECK(dcpl_id);
    CHECK(H5Pset_filter(dcpl_id, 0x2112, H5Z_FLAG_MANDATORY, 0, NULL));
    CHECK(H5Pset_chunk(dcpl_id, 2, dims));
    hid_t dset_id = H5Dcreate(file_id, "crash_dataset", H5T_STD_I8LE, space_id, H5P_DEFAULT, dcpl_id, H5P_DEFAULT);
    CHECK(dset_id);
    char *data = (char *) calloc(dims[0] * dims[1], sizeof(char));
    CHECK(H5Dwrite(dset_id, H5T_STD_I8LE, H5S_ALL, H5S_ALL, H5P_DEFAULT, data));
    CHECK(H5Dclose(dset_id));
    CHECK(H5Pclose(dcpl_id));
    CHECK(H5Sclose(space_id));
    CHECK(H5Fclose(file_id));
    free(data);
    return 0;
}

If you change the I/O filter code so that it allocates 10x10, or even 100x100, the problem won’t kick in.

Metadata

Metadata

Assignees

Labels

Component - C LibraryCore C library issues (usually in the src directory)Priority - 0. BlockerThis MUST be merged for the release to happenPriority - 1. HighThese are important issues that should be resolved in the next releaseType - SecuritySecurity issues, including library crashers and memory leaks

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions