|
| 1 | +# Support variable chunking and compression |
| 2 | + |
| 3 | +PnetCDF contains an experimental variable chunking and compression feature |
| 4 | +for classic NetCDF files. |
| 5 | + |
| 6 | +For details about its design and implementation, please refer to: |
| 7 | +Hou, Kaiyuan, et al. "Supporting Data Compression in PnetCDF." |
| 8 | +2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021. |
| 9 | + |
| 10 | +## Enable variable chunking support |
| 11 | + |
| 12 | +* To build PnetCDF with variable chunking support |
| 13 | + + Add `--enable-chunking` option at the configure command line. For example, |
| 14 | + ``` |
| 15 | + ./configure --prefix=/PnetCDF/install/path --enable-chunking |
| 16 | + ``` |
| 17 | +* To build deflate filter support for chunked variable |
| 18 | + + Add `--enable-zlib` option at the configure command line. Option |
| 19 | + `--with-zlib` can also be used to specify the installation path of |
| 20 | + zlib if it is not in the standard locations. For example, |
| 21 | + ``` |
| 22 | + ./configure --prefix=/PnetCDF/install/path --enable-chunking --enable-zlib \ |
| 23 | + --with-zlib=/zlib/install/path |
| 24 | + ``` |
| 25 | +* To build sz filter support for chunked variable |
| 26 | + + Add `--enable-sz` option at the configure command line. Option |
| 27 | + `--with-sz` can also be used to specify the installation path of |
| 28 | + sz if it is not in the standard locations. For example, |
| 29 | + ``` |
| 30 | + ./configure --prefix=/PnetCDF/install/path --enable-chunking --enable-sz \ |
| 31 | + --with-sz=/sz/install/path |
| 32 | + ``` |
| 33 | +
|
| 34 | +## Enable variable chunking |
| 35 | +
|
| 36 | +To enable chunked storage layout for variables, set the file info "nc_chunking" |
| 37 | +to "enable". The chunking feature requires 64-bit NetCDF format (CDF5). |
| 38 | +For example, |
| 39 | +``` |
| 40 | + MPI_Info_create(&info); |
| 41 | + ncmpi_create(MPI_COMM_WORLD, fname, NC_64BIT_DATA, info, &ncid); |
| 42 | +``` |
| 43 | +Alternatively, the file info can be set through the environment variable |
| 44 | +"PNETCDF_HINTS". |
| 45 | +``` |
| 46 | +export PNETCDF_HINTS="nc_chunking=enable" |
| 47 | +``` |
| 48 | +When chunking is enabled, all non-scalar variables will be stored in a chunked |
| 49 | +storage layout. Scalar variables are not chunked. |
| 50 | +
|
| 51 | +Users can also set the default filter for chunked variables. For example, |
| 52 | +``` |
| 53 | + MPI_Info_set(info, "nc_chunk_default_filter", "zlib"); |
| 54 | +``` |
| 55 | +or |
| 56 | +``` |
| 57 | +export PNETCDF_HINTS="nc_chunking=enable;nc_chunk_default_filter=zlib" |
| 58 | +``` |
| 59 | +The available filter options are none (default), zlib (deflate), sz. |
| 60 | +
|
| 61 | +## Define chunk dimension of variables |
| 62 | +
|
| 63 | +Applications can use the following APIs to set and get the chunk dimension of |
| 64 | +a variable. |
| 65 | +``` |
| 66 | + int ncmpi_var_set_chunk (int ncid, int varid, int *chunk_dim); |
| 67 | + int ncmpi_var_get_chunk (int ncid, int varid, int *chunk_dim); |
| 68 | +``` |
| 69 | +For example: |
| 70 | +``` |
| 71 | + int dim[2] = {100, 100}; |
| 72 | + int chunk_dim[2] = {10, 10}; |
| 73 | + ncmpi_def_var (ncid, name, type, 2, dim, &varid) |
| 74 | + ncmpi_var_set_chunk (ncid, varid, chunk_dim); |
| 75 | +``` |
| 76 | +For record variables, the chunk dimension along the record dimension is always |
| 77 | +1. |
| 78 | +The default chunk dimension is the dimension of the variable except for the |
| 79 | +record dimension. By default, PnetCDF will create one chunk per record or |
| 80 | +variable. |
| 81 | +
|
| 82 | +## Define filter for chunked variables |
| 83 | +
|
| 84 | +Applications can use the following APIs to set and get the chunk dimension of |
| 85 | +a variable. |
| 86 | +``` |
| 87 | +#define NC_FILTER_NONE 0 |
| 88 | +#define NC_FILTER_DEFLATE 2 |
| 89 | +#define NC_FILTER_SZ 3 |
| 90 | +int ncmpi_var_set_filter (int ncid, int varid, int filter); |
| 91 | +int ncmpi_var_get_filter (int ncid, int varid, int *filter); |
| 92 | +``` |
| 93 | +For example: |
| 94 | +``` |
| 95 | + ncmpi_var_set_filter (ncid, varid, NC_FILTER_DEFLATE); |
| 96 | +``` |
| 97 | +Valid filter values are NC_FILTER_NONE (none), NC_FILTER_DEFLATE (zlib), and |
| 98 | +NC_FILTER_SZ (sz). |
| 99 | +
|
| 100 | +
|
| 101 | +## Known problems |
| 102 | +
|
| 103 | +There are some limitations of the experimental variable chunking feature. |
| 104 | +
|
| 105 | +* Only one filter can be applied to a chunked variable. Unlike HDF5 which allows |
| 106 | + the stacking of multiple filters on chunked datasets, the current |
| 107 | + implementation in PnetCDF only allows a single filter to be applied to a |
| 108 | + variable. |
| 109 | +* No per-variable option for variable chunking. If chunking is enabled, all |
| 110 | + non-scalar variables will be chunked even if the chunk dimension is not |
| 111 | + defined. |
| 112 | +* Independent variable I/O is not supported. Variable read/write (get/put) |
| 113 | + must be collective in order to maintain data consistency of filtered chunks. |
| 114 | + Non-blocking APIs can be used to mitigate the impact of this limitation. |
| 115 | +
|
| 116 | +Copyright (C) 2022, Northwestern University and Argonne National Laboratory |
| 117 | +
|
| 118 | +See the COPYRIGHT notice in the top-level directory. |
| 119 | +
|
0 commit comments