|
1 |
| -# ome2024-ngff-challenge |
| 1 | +# n4bi-repo-challenge-2025 |
2 | 2 |
|
3 | 3 | [![Actions Status][actions-badge]][actions-link]
|
4 |
| -[![PyPI version][pypi-version]][pypi-link] |
5 |
| -[![PyPI platforms][pypi-platforms]][pypi-link] |
6 | 4 | [![Image.SC Zulip][zulip-badge]][zulip-link]
|
7 | 5 |
|
8 | 6 | <!-- SPHINX-START -->
|
9 | 7 |
|
10 | 8 | <!-- prettier-ignore-start -->
|
11 |
| -[actions-badge]: https://github.com/ome/ome2024-ngff-challenge/workflows/CI/badge.svg |
12 |
| -[actions-link]: https://github.com/ome/ome2024-ngff-challenge/actions |
13 |
| -[conda-badge]: https://img.shields.io/conda/vn/conda-forge/ome2024-ngff-challenge |
14 |
| -[conda-link]: https://github.com/conda-forge/ome2024-ngff-challenge-feedstock |
| 9 | +[actions-badge]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/workflows/CI/badge.svg |
| 10 | +[actions-link]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/actions |
15 | 11 | [github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
|
16 |
| -[github-discussions-link]: https://github.com/ome/ome2024-ngff-challenge/discussions |
17 |
| -[pypi-link]: https://pypi.org/project/ome2024-ngff-challenge/ |
18 |
| -[pypi-platforms]: https://img.shields.io/pypi/pyversions/ome2024-ngff-challenge |
19 |
| -[pypi-version]: https://img.shields.io/pypi/v/ome2024-ngff-challenge |
20 |
| -[rtd-badge]: https://readthedocs.org/projects/ome2024-ngff-challenge/badge/?version=latest |
21 |
| -[rtd-link]: https://ome2024-ngff-challenge.readthedocs.io/en/latest/?badge=latest |
| 12 | +[github-discussions-link]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/discussions |
22 | 13 | [zulip-badge]: https://img.shields.io/badge/zulip-join_chat-brightgreen.svg
|
23 | 14 | [zulip-link]: https://imagesc.zulipchat.com/#narrow/stream/328251-NGFF
|
24 | 15 |
|
25 | 16 | <!-- prettier-ignore-end -->
|
26 | 17 |
|
27 |
| -Project planning and material repository for the 2024 challenge to generate 1 PB |
28 |
| -of OME-Zarr data. |
| 18 | +Project planning and material repository for the 2025 repoository challenge to |
| 19 | +upload data to a variety of bioimaging repositories. |
29 | 20 |
|
30 |
| -**UPDATE (20th October 2024)**: The data converted for the challenge can now be |
31 |
| -browsed and searched at |
32 |
| -[https://ome.github.io/ome2024-ngff-challenge/](https://ome.github.io/ome2024-ngff-challenge/). |
33 |
| - |
34 |
| -## Challenge overview |
35 |
| - |
36 |
| -The high-level goal of the challenge is to generate OME-Zarr data according to a |
37 |
| -development version of the specification to drive forward the implementation |
38 |
| -work and establish a baseline for the conversion costs that members of the |
39 |
| -community can expect to incur. |
40 |
| - |
41 |
| -Data generated within the challenge will have: |
42 |
| - |
43 |
| -- all v2 arrays converted to v3, optionally sharding the data |
44 |
| -- all .zattrs metadata migrated to `zarr.json["attributes"]["ome"]` |
45 |
| -- a top-level `ro-crate-metadata.json` file with minimal metadata (specimen and |
46 |
| - imaging modality) |
47 |
| - |
48 |
| -You can example the contents of a sample dataset by using |
49 |
| -[the minio client](https://github.com/minio/mc): |
50 |
| - |
51 |
| -``` |
52 |
| -$ mc config host add uk1anon https://uk1s3.embassy.ebi.ac.uk "" "" |
53 |
| -Added `uk1anon` successfully. |
54 |
| -$ mc ls -r uk1anon/idr/share/ome2024-ngff-challenge/0.0.5/6001240.zarr/ |
55 |
| -[2024-08-01 14:24:35 CEST] 24MiB STANDARD 0/c/0/0/0/0 |
56 |
| -[2024-08-01 14:24:28 CEST] 598B STANDARD 0/zarr.json |
57 |
| -[2024-08-01 14:24:32 CEST] 6.0MiB STANDARD 1/c/0/0/0/0 |
58 |
| -[2024-08-01 14:24:28 CEST] 598B STANDARD 1/zarr.json |
59 |
| -[2024-08-01 14:24:29 CEST] 1.6MiB STANDARD 2/c/0/0/0/0 |
60 |
| -[2024-08-01 14:24:28 CEST] 592B STANDARD 2/zarr.json |
61 |
| -[2024-08-01 14:24:28 CEST] 1.2KiB STANDARD ro-crate-metadata.json |
62 |
| -[2024-08-01 14:24:28 CEST] 2.7KiB STANDARD zarr.json |
63 |
| -``` |
64 |
| - |
65 |
| -Other samples: |
66 |
| - |
67 |
| -- [4496763.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/4496763.zarr) |
68 |
| - Shape `4,25,2048,2048`, Size `589.81 MB`, from idr0047. |
69 |
| -- [9822152.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0083/9822152.zarr) |
70 |
| - Shape `1,1,1,93184,144384`, Size `21.57 GB`, from idr0083. |
71 |
| -- [9846151.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0048/9846151.zarr) |
72 |
| - Shape `1,3,1402,5192,2947`, Size `66.04 GB`, from idr0048. |
73 |
| -- [Week9_090907.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0035/Week9_090907.zarr) |
74 |
| - plate from idr0035. |
75 |
| -- [l4_sample/color](https://ome.github.io/ome-ngff-validator/?source=https://data-humerus.webknossos.org/data/zarr3_experimental/scalable_minds/l4_sample/color) |
76 |
| - from WebKnossos. |
77 |
| -- Plates from idr0090: |
78 |
| - [190129.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190129.zarr) |
79 |
| - Size `1.0 TB`, |
80 |
| - [190206.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190206.zarr) |
81 |
| - Size `485 GB`, |
82 |
| - [190211.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190211.zarr) |
83 |
| - Size `704 GB`. |
84 |
| -- [76-45.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0010/76-45.zarr) |
85 |
| - plate from idr0010 |
86 |
| - |
87 |
| - <details><summary>Expand for more details on creation of these samples</summary> |
88 |
| - |
89 |
| -<hr> |
90 |
| - |
91 |
| -`4496763.json` was created with ome2024-ngff-challenge commit `0e1809bf3b`. |
92 |
| - |
93 |
| -First the config details were generated with: |
94 |
| - |
95 |
| -``` |
96 |
| -$ ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0047A/4496763.zarr params_4496763.json --output-write-details |
97 |
| -``` |
98 |
| - |
99 |
| -The `params_4496763.json` file was edited to set "shards" to: |
100 |
| -`[4, 1, sizeY, sizeX]` for each pyramid resolution to create a single shard for |
101 |
| -each Z section. |
102 |
| - |
103 |
| -``` |
104 |
| -# params_4496763.json |
105 |
| -[{"shape": [4, 25, 2048, 2048], "chunks": [1, 1, 2048, 2048], "shards": [4, 1, 2048, 2048]}, {"shape": [4, 25, 1024, 1024], "chunks": [1, 1, 1024, 1024], "shards": [4, 1, 1024, 1024]}, {"shape": [4, 25, 512, 512], "chunks": [1, 1, 512, 512], "shards": [4, 1, 512, 512]}, {"shape": [4, 25, 256, 256], "chunks": [1, 1, 256, 256], "shards": [4, 1, 256, 256]}, {"shape": [4, 25, 128, 128], "chunks": [1, 1, 128, 128], "shards": [4, 1, 128, 128]}, {"shape": [4, 25, 64, 64], "chunks": [1, 1, 64, 64], "shards": [4, 1, 64, 64]}] |
106 |
| -``` |
107 |
| - |
108 |
| -This was then used to run the conversion: |
109 |
| - |
110 |
| -``` |
111 |
| -ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0047A/4496763.zarr 4496763.zarr --output-read-details params_4496763.json |
112 |
| -``` |
113 |
| - |
114 |
| -<hr> |
115 |
| - |
116 |
| -`9822152.zarr` was created with ome2024-ngff-challenge commit `f17a6de963`. |
117 |
| - |
118 |
| -The chunks and shard shapes are specified to be the same for all resolution |
119 |
| -levels. This is required since the smaller resolution levels of the source image |
120 |
| -at |
121 |
| -https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0083A/9822152.zarr |
122 |
| -have chunks that correspond to the resolution shape, e,g, `1,1,1,91,141` and |
123 |
| -this will fail to convert using a shard shape of `1,1,1,4096,4096`. |
124 |
| - |
125 |
| -Took 34 minutes to run conversion with this command: |
126 |
| - |
127 |
| -``` |
128 |
| -$ ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0083A/9822152.zarr 9822152.zarr --output-shards=1,1,1,4096,4096 --output-chunks=1,1,1,1024,1024 --log debug |
129 |
| -``` |
130 |
| - |
131 |
| -<hr> |
132 |
| - |
133 |
| -Took 9 hours to run this conversion (before multi-threading changes): |
134 |
| - |
135 |
| -``` |
136 |
| -$ ome2024-ngff-challenge 9846151.zarr/0 will/9846151_2D_chunks_3.zarr --output-shards=1,1,1,4096,4096 --output-chunks=1,1,1,1024,1024 --log debug |
137 |
| -``` |
138 |
| - |
139 |
| -<hr> |
140 |
| - |
141 |
| -Plate conversion, took 19 minutes, choosing a shard size that contained a whole |
142 |
| -image. Image shape is `1,3,1,1024,1280`. |
143 |
| - |
144 |
| -``` |
145 |
| -$ ome2024-ngff-challenge --input-bucket=bia-integrator-data --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon S-BIAD847/0762bf96-4f01-454d-9b13-5c8438ea384f/0762bf96-4f01-454d-9b13-5c8438ea384f.zarr /data/will/idr0035/Week9_090907.zarr --output-shards=1,3,1,1024,2048 --output-chunks=1,1,1,1024,1024 --log debug |
146 |
| -``` |
147 |
| - |
148 |
| - </details> |
149 |
| - |
150 |
| -## CLI Commands |
151 |
| - |
152 |
| -### `resave`: convert your data |
153 |
| - |
154 |
| -The `ome2024-ngff-challenge` tool can be used to convert an OME-Zarr 0.4 dataset |
155 |
| -that is based on Zarr v2. The input data will **not be modified** in any way and |
156 |
| -a full copy of the data will be created at the chosen location. |
157 |
| - |
158 |
| -#### Getting started |
159 |
| - |
160 |
| -``` |
161 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr |
162 |
| -``` |
163 |
| - |
164 |
| -is the most basic invocation of the tool. If you do not choose a license, the |
165 |
| -application will fail with: |
166 |
| - |
167 |
| -``` |
168 |
| -No license set. Choose one of the Creative Commons license (e.g., `--cc-by`) or skip RO-Crate creation (`--rocrate-skip`) |
169 |
| -``` |
170 |
| - |
171 |
| -#### Licenses |
172 |
| - |
173 |
| -There are several license options to choose from. We suggest one of: |
174 |
| - |
175 |
| -- `--cc-by`: credit must be given to the creator |
176 |
| -- `--cc0`: Add your data to the public domain |
177 |
| - |
178 |
| -Alternatively, you can choose your own license, e.g., |
179 |
| - |
180 |
| -`--rocrate-license=https://creativecommons.org/licenses/by-nc/4.0/` |
181 |
| - |
182 |
| -to restrict commercial use of your data. Additionally, you can disable metadata |
183 |
| -collection at all. |
184 |
| - |
185 |
| -**Note:** you will need to add metadata later for your dataset to be considered |
186 |
| -valid. |
187 |
| - |
188 |
| -#### Metadata |
189 |
| - |
190 |
| -There are four additional fields of metadata that are being collected for the |
191 |
| -challenge: |
192 |
| - |
193 |
| -- organism and modality: RECOMMENDED |
194 |
| -- name and description: SUGGESTED |
195 |
| - |
196 |
| -These can be set via the properties prefixed with `--rocrate-` since they will |
197 |
| -be stored in the standard [RO-Crate](https://w3id.org/ro/crate/) JSON file |
198 |
| -(`./ro-crate-metadata.json`) at the top-level of the Zarr dataset. |
199 |
| - |
200 |
| -``` |
201 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-organism=NCBI:txid9606 # Human |
202 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-modality=obo:FBbi_00000369 # SPIM |
203 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-name="short name of dataset" |
204 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-description="and a longer description" |
205 |
| -``` |
206 |
| - |
207 |
| -For other examples including several other NCBI and FBbi terms, please see: |
208 |
| - |
209 |
| -``` |
210 |
| -ome2024-ngff-challenge resave --help |
211 |
| -``` |
212 |
| - |
213 |
| -#### Re-running the script |
214 |
| - |
215 |
| -If you would like to re-run the script with different parameters, you can |
216 |
| -additionally set `--output-overwrite` to ignore a previous conversion: |
217 |
| - |
218 |
| -``` |
219 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-overwrite |
220 |
| -``` |
221 |
| - |
222 |
| -#### Writing in parallel |
223 |
| - |
224 |
| -By default, 16 chunks of data will be processed simultaneously in order to bound |
225 |
| -memory usage. You can increase this number based on your local resources: |
226 |
| - |
227 |
| -``` |
228 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-threads=128 |
229 |
| -``` |
230 |
| - |
231 |
| -#### Reading/writing remotely |
232 |
| - |
233 |
| -If you would like to avoid downloading and/or upload the Zarr datasets, you can |
234 |
| -set S3 parameters on the command-line which will then treat the input and/or |
235 |
| -output datasets as a prefix within an S3 bucket: |
236 |
| - |
237 |
| -``` |
238 |
| -ome2024-ngff-challenge resave --cc-by \ |
239 |
| - --input-bucket=BUCKET \ |
240 |
| - --input-endpoint=HOST \ |
241 |
| - --input-anon \ |
242 |
| - input.zarr \ |
243 |
| - output.zarr |
244 |
| -``` |
245 |
| - |
246 |
| -A small example you can try yourself: |
247 |
| - |
248 |
| -``` |
249 |
| -ome2024-ngff-challenge resave --cc-by \ |
250 |
| - --input-bucket=idr \ |
251 |
| - --input-endpoint=https://uk1s3.embassy.ebi.ac.uk \ |
252 |
| - --input-anon \ |
253 |
| - zarr/v0.4/idr0062A/6001240.zarr \ |
254 |
| - /tmp/6001240.zarr |
255 |
| -``` |
256 |
| - |
257 |
| -#### Reading/writing via a script |
258 |
| - |
259 |
| -Another R/W option is to have `resave.py` generate a script which you can |
260 |
| -execute later. If you pass `--output-script`, then rather than generate the |
261 |
| -arrays immediately, a file named `convert.sh` will be created which can be |
262 |
| -executed later. |
263 |
| - |
264 |
| -For example, running: |
265 |
| - |
266 |
| -``` |
267 |
| -ome2024-ngff-challenge resave --cc-by dev2/input.zarr /tmp/scripts.zarr --output-script |
268 |
| -``` |
269 |
| - |
270 |
| -produces a dataset with one `zarr.json` file and 3 `convert.sh` scripts: |
271 |
| - |
272 |
| -``` |
273 |
| -/tmp/scripts.zarr/0/convert.sh |
274 |
| -/tmp/scripts.zarr/1/convert.sh |
275 |
| -/tmp/scripts.zarr/2/convert.sh |
276 |
| -``` |
277 |
| - |
278 |
| -Each of the scripts contains a statement of the form: |
279 |
| - |
280 |
| -``` |
281 |
| -zarrs_reencode --chunk-shape 1,1,275,271 --shard-shape 2,236,275,271 --dimension-names c,z,y,x --validate dev2/input.zarr /tmp/scripts.zarr |
282 |
| -``` |
283 |
| - |
284 |
| -Running this script will require having installed `zarrs_tools` with: |
285 |
| - |
286 |
| -``` |
287 |
| -cargo install zarrs_tools |
288 |
| -export PATH=$PATH:$HOME/.cargo/bin |
289 |
| -``` |
290 |
| - |
291 |
| -#### Optimizing chunks and shards |
292 |
| - |
293 |
| -Zarr v3 supports shards, which are files that contain multiple chunks. The shape |
294 |
| -of a shard must be a multiple of the chunk size in every dimension. There is not |
295 |
| -yet a single heuristic for determining the chunk and shard sizes that will work |
296 |
| -for all data. **The default shard shape chosen by resave is the full shape of |
297 |
| -the image array.** |
298 |
| - |
299 |
| -In order to limit the size of a shard, if the shard exceeds 100,000,000 pixels |
300 |
| -then you must specify the shard-shape. You can specify the shard shape, using |
301 |
| ---output-shards, which will be used for all pyramid resolutions. This may cause |
302 |
| -issues if the chunk shape changes for lower resolutions (to match the smaller |
303 |
| -image shape). In this case, you should also specify the chunk-shape to be used |
304 |
| -for all resolutions: |
305 |
| - |
306 |
| -``` |
307 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-chunks=1,1,1,256,256 --output-shards=1,1,1,2048,2048 |
308 |
| -``` |
309 |
| - |
310 |
| -Alternatively, you can use a JSON file to review and manually optimize the |
311 |
| -chunking and sharding parameters on a per-resolution basis: |
312 |
| - |
313 |
| -``` |
314 |
| -ome2024-ngff-challenge resave --cc-by input.zarr parameters.json --output-write-details |
315 |
| -``` |
316 |
| - |
317 |
| -This will write a JSON file of the form: |
318 |
| - |
319 |
| -``` |
320 |
| -[{"shape": [...], "chunks": [...], "shards": [...]}, ... |
321 |
| -``` |
322 |
| - |
323 |
| -where the order of the dictionaries matches the order of the "datasets" field in |
324 |
| -the "multiscales". Edits to this file can be read back in using the |
325 |
| -`output-read-details` flag: |
326 |
| - |
327 |
| -``` |
328 |
| -ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-read-details=parameters.json |
329 |
| -``` |
330 |
| - |
331 |
| -Note: Changes to the shape are ignored. |
332 |
| - |
333 |
| -#### More information |
334 |
| - |
335 |
| -See `ome2024-ngff-challenge resave -h` for more arguments and examples. |
336 |
| - |
337 |
| -### `lookup`: finding ontology terms (WIP) |
338 |
| - |
339 |
| -The `ome2024-ngff-challenge` tool can also be used to look up terms from the EBI |
340 |
| -OLS for setting metadata fields like `--rocrate-modality` and |
341 |
| -`--rocrate-organism`: |
342 |
| - |
343 |
| -``` |
344 |
| -ome2024-ngff-challenge lookup "homo sapiens" |
345 |
| -ONTOLOGY TERM LABEL DESCRIPTION |
346 |
| -ncbitaxon NCBITaxon_9606 Homo sapiens |
347 |
| -vto VTO_0011993 Homo sapiens |
348 |
| -snomed SNOMED_337915000 Homo sapiens |
349 |
| -... |
350 |
| -``` |
351 |
| - |
352 |
| -## Related work |
353 |
| - |
354 |
| -The following additional PRs are required to work with the data created by the |
355 |
| -scripts in this repository: |
356 |
| - |
357 |
| -- https://github.com/ome/ome-ngff-validator/pull/36 |
358 |
| -- https://github.com/ome/ome-zarr-py/pull/383 |
359 |
| -- https://github.com/hms-dbmi/vizarr/pull/172 |
360 |
| -- https://github.com/LDeakin/zarrs_tools/issues/8 |
361 |
| - |
362 |
| -Slightly less related but important at the moment: |
363 |
| - |
364 |
| -- https://github.com/google/neuroglancer/issues/606 |
365 |
| -- https://github.com/ome/napari-ome-zarr/pull/112 |
366 |
| -- https://github.com/zarr-developers/zarr-python/issues/2029 |
| 21 | +The data uploaded for the challenge can be browsed and searched at |
| 22 | +[https://nfdi4bioimage.github.io/n4bi-repo-challenge-2025/](https://nfdi4bioimage.github.io/n4bi-repo-challenge-2025/). |
0 commit comments