Skip to content

Commit e2ec339

Browse files
committed
add documentation about chunked I/O driver
1 parent 13d9d15 commit e2ec339

File tree

1 file changed

+119
-0
lines changed

1 file changed

+119
-0
lines changed

doc/README.Chunk.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Support variable chunking and compression
2+
3+
PnetCDF contains an experimental variable chunking and compression feature
4+
for classic NetCDF files.
5+
6+
For details about its design and implementation, please refer to:
7+
Hou, Kaiyuan, et al. "Supporting Data Compression in PnetCDF."
8+
2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021.
9+
10+
## Enable variable chunking support
11+
12+
* To build PnetCDF with variable chunking support
13+
+ Add `--enable-chunking` option at the configure command line. For example,
14+
```
15+
./configure --prefix=/PnetCDF/install/path --enable-chunking
16+
```
17+
* To build deflate filter support for chunked variable
18+
+ Add `--enable-zlib` option at the configure command line. Option
19+
`--with-zlib` can also be used to specify the installation path of
20+
zlib if it is not in the standard locations. For example,
21+
```
22+
./configure --prefix=/PnetCDF/install/path --enable-chunking --enable-zlib \
23+
--with-zlib=/zlib/install/path
24+
```
25+
* To build sz filter support for chunked variable
26+
+ Add `--enable-sz` option at the configure command line. Option
27+
`--with-sz` can also be used to specify the installation path of
28+
sz if it is not in the standard locations. For example,
29+
```
30+
./configure --prefix=/PnetCDF/install/path --enable-chunking --enable-sz \
31+
--with-sz=/sz/install/path
32+
```
33+
34+
## Enable variable chunking
35+
36+
To enable chunked storage layout for variables, set the file info "nc_chunking"
37+
to "enable". The chunking feature requires 64-bit NetCDF format (CDF5).
38+
For example,
39+
```
40+
MPI_Info_create(&info);
41+
ncmpi_create(MPI_COMM_WORLD, fname, NC_64BIT_DATA, info, &ncid);
42+
```
43+
Alternatively, the file info can be set through the environment variable
44+
"PNETCDF_HINTS".
45+
```
46+
export PNETCDF_HINTS="nc_chunking=enable"
47+
```
48+
When chunking is enabled, all non-scalar variables will be stored in a chunked
49+
storage layout. Scalar variables are not chunked.
50+
51+
Users can also set the default filter for chunked variables. For example,
52+
```
53+
MPI_Info_set(info, "nc_chunk_default_filter", "zlib");
54+
```
55+
or
56+
```
57+
export PNETCDF_HINTS="nc_chunking=enable;nc_chunk_default_filter=zlib"
58+
```
59+
The available filter options are none (default), zlib (deflate), sz.
60+
61+
## Define chunk dimension of variables
62+
63+
Applications can use the following APIs to set and get the chunk dimension of
64+
a variable.
65+
```
66+
int ncmpi_var_set_chunk (int ncid, int varid, int *chunk_dim);
67+
int ncmpi_var_get_chunk (int ncid, int varid, int *chunk_dim);
68+
```
69+
For example:
70+
```
71+
int dim[2] = {100, 100};
72+
int chunk_dim[2] = {10, 10};
73+
ncmpi_def_var (ncid, name, type, 2, dim, &varid)
74+
ncmpi_var_set_chunk (ncid, varid, chunk_dim);
75+
```
76+
For record variables, the chunk dimension along the record dimension is always
77+
1.
78+
The default chunk dimension is the dimension of the variable except for the
79+
record dimension. By default, PnetCDF will create one chunk per record or
80+
variable.
81+
82+
## Define filter for chunked variables
83+
84+
Applications can use the following APIs to set and get the chunk dimension of
85+
a variable.
86+
```
87+
#define NC_FILTER_NONE 0
88+
#define NC_FILTER_DEFLATE 2
89+
#define NC_FILTER_SZ 3
90+
int ncmpi_var_set_filter (int ncid, int varid, int filter);
91+
int ncmpi_var_get_filter (int ncid, int varid, int *filter);
92+
```
93+
For example:
94+
```
95+
ncmpi_var_set_filter (ncid, varid, NC_FILTER_DEFLATE);
96+
```
97+
Valid filter values are NC_FILTER_NONE (none), NC_FILTER_DEFLATE (zlib), and
98+
NC_FILTER_SZ (sz).
99+
100+
101+
## Known problems
102+
103+
There are some limitations of the experimental variable chunking feature.
104+
105+
* Only one filter can be applied to a chunked variable. Unlike HDF5 which allows
106+
the stacking of multiple filters on chunked datasets, the current
107+
implementation in PnetCDF only allows a single filter to be applied to a
108+
variable.
109+
* No per-variable option for variable chunking. If chunking is enabled, all
110+
non-scalar variables will be chunked even if the chunk dimension is not
111+
defined.
112+
* Independent variable I/O is not supported. Variable read/write (get/put)
113+
must be collective in order to maintain data consistency of filtered chunks.
114+
Non-blocking APIs can be used to mitigate the impact of this limitation.
115+
116+
Copyright (C) 2022, Northwestern University and Argonne National Laboratory
117+
118+
See the COPYRIGHT notice in the top-level directory.
119+

0 commit comments

Comments
 (0)