Skip to content

Commit 61621e1

Browse files
committed
Update README
1 parent d50f62e commit 61621e1

File tree

1 file changed

+8
-352
lines changed

1 file changed

+8
-352
lines changed

README.md

+8-352
Original file line numberDiff line numberDiff line change
@@ -1,366 +1,22 @@
1-
# ome2024-ngff-challenge
1+
# n4bi-repo-challenge-2025
22

33
[![Actions Status][actions-badge]][actions-link]
4-
[![PyPI version][pypi-version]][pypi-link]
5-
[![PyPI platforms][pypi-platforms]][pypi-link]
64
[![Image.SC Zulip][zulip-badge]][zulip-link]
75

86
<!-- SPHINX-START -->
97

108
<!-- prettier-ignore-start -->
11-
[actions-badge]: https://github.com/ome/ome2024-ngff-challenge/workflows/CI/badge.svg
12-
[actions-link]: https://github.com/ome/ome2024-ngff-challenge/actions
13-
[conda-badge]: https://img.shields.io/conda/vn/conda-forge/ome2024-ngff-challenge
14-
[conda-link]: https://github.com/conda-forge/ome2024-ngff-challenge-feedstock
9+
[actions-badge]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/workflows/CI/badge.svg
10+
[actions-link]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/actions
1511
[github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
16-
[github-discussions-link]: https://github.com/ome/ome2024-ngff-challenge/discussions
17-
[pypi-link]: https://pypi.org/project/ome2024-ngff-challenge/
18-
[pypi-platforms]: https://img.shields.io/pypi/pyversions/ome2024-ngff-challenge
19-
[pypi-version]: https://img.shields.io/pypi/v/ome2024-ngff-challenge
20-
[rtd-badge]: https://readthedocs.org/projects/ome2024-ngff-challenge/badge/?version=latest
21-
[rtd-link]: https://ome2024-ngff-challenge.readthedocs.io/en/latest/?badge=latest
12+
[github-discussions-link]: https://github.com/nfdi4bioimage/n4bi-repo-challenge-2025/discussions
2213
[zulip-badge]: https://img.shields.io/badge/zulip-join_chat-brightgreen.svg
2314
[zulip-link]: https://imagesc.zulipchat.com/#narrow/stream/328251-NGFF
2415

2516
<!-- prettier-ignore-end -->
2617

27-
Project planning and material repository for the 2024 challenge to generate 1 PB
28-
of OME-Zarr data.
18+
Project planning and material repository for the 2025 repoository challenge to
19+
upload data to a variety of bioimaging repositories.
2920

30-
**UPDATE (20th October 2024)**: The data converted for the challenge can now be
31-
browsed and searched at
32-
[https://ome.github.io/ome2024-ngff-challenge/](https://ome.github.io/ome2024-ngff-challenge/).
33-
34-
## Challenge overview
35-
36-
The high-level goal of the challenge is to generate OME-Zarr data according to a
37-
development version of the specification to drive forward the implementation
38-
work and establish a baseline for the conversion costs that members of the
39-
community can expect to incur.
40-
41-
Data generated within the challenge will have:
42-
43-
- all v2 arrays converted to v3, optionally sharding the data
44-
- all .zattrs metadata migrated to `zarr.json["attributes"]["ome"]`
45-
- a top-level `ro-crate-metadata.json` file with minimal metadata (specimen and
46-
imaging modality)
47-
48-
You can example the contents of a sample dataset by using
49-
[the minio client](https://github.com/minio/mc):
50-
51-
```
52-
$ mc config host add uk1anon https://uk1s3.embassy.ebi.ac.uk "" ""
53-
Added `uk1anon` successfully.
54-
$ mc ls -r uk1anon/idr/share/ome2024-ngff-challenge/0.0.5/6001240.zarr/
55-
[2024-08-01 14:24:35 CEST] 24MiB STANDARD 0/c/0/0/0/0
56-
[2024-08-01 14:24:28 CEST] 598B STANDARD 0/zarr.json
57-
[2024-08-01 14:24:32 CEST] 6.0MiB STANDARD 1/c/0/0/0/0
58-
[2024-08-01 14:24:28 CEST] 598B STANDARD 1/zarr.json
59-
[2024-08-01 14:24:29 CEST] 1.6MiB STANDARD 2/c/0/0/0/0
60-
[2024-08-01 14:24:28 CEST] 592B STANDARD 2/zarr.json
61-
[2024-08-01 14:24:28 CEST] 1.2KiB STANDARD ro-crate-metadata.json
62-
[2024-08-01 14:24:28 CEST] 2.7KiB STANDARD zarr.json
63-
```
64-
65-
Other samples:
66-
67-
- [4496763.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/4496763.zarr)
68-
Shape `4,25,2048,2048`, Size `589.81 MB`, from idr0047.
69-
- [9822152.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0083/9822152.zarr)
70-
Shape `1,1,1,93184,144384`, Size `21.57 GB`, from idr0083.
71-
- [9846151.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0048/9846151.zarr)
72-
Shape `1,3,1402,5192,2947`, Size `66.04 GB`, from idr0048.
73-
- [Week9_090907.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0035/Week9_090907.zarr)
74-
plate from idr0035.
75-
- [l4_sample/color](https://ome.github.io/ome-ngff-validator/?source=https://data-humerus.webknossos.org/data/zarr3_experimental/scalable_minds/l4_sample/color)
76-
from WebKnossos.
77-
- Plates from idr0090:
78-
[190129.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190129.zarr)
79-
Size `1.0 TB`,
80-
[190206.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190206.zarr)
81-
Size `485 GB`,
82-
[190211.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0090/190211.zarr)
83-
Size `704 GB`.
84-
- [76-45.zarr](https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0010/76-45.zarr)
85-
plate from idr0010
86-
87-
<details><summary>Expand for more details on creation of these samples</summary>
88-
89-
<hr>
90-
91-
`4496763.json` was created with ome2024-ngff-challenge commit `0e1809bf3b`.
92-
93-
First the config details were generated with:
94-
95-
```
96-
$ ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0047A/4496763.zarr params_4496763.json --output-write-details
97-
```
98-
99-
The `params_4496763.json` file was edited to set "shards" to:
100-
`[4, 1, sizeY, sizeX]` for each pyramid resolution to create a single shard for
101-
each Z section.
102-
103-
```
104-
# params_4496763.json
105-
[{"shape": [4, 25, 2048, 2048], "chunks": [1, 1, 2048, 2048], "shards": [4, 1, 2048, 2048]}, {"shape": [4, 25, 1024, 1024], "chunks": [1, 1, 1024, 1024], "shards": [4, 1, 1024, 1024]}, {"shape": [4, 25, 512, 512], "chunks": [1, 1, 512, 512], "shards": [4, 1, 512, 512]}, {"shape": [4, 25, 256, 256], "chunks": [1, 1, 256, 256], "shards": [4, 1, 256, 256]}, {"shape": [4, 25, 128, 128], "chunks": [1, 1, 128, 128], "shards": [4, 1, 128, 128]}, {"shape": [4, 25, 64, 64], "chunks": [1, 1, 64, 64], "shards": [4, 1, 64, 64]}]
106-
```
107-
108-
This was then used to run the conversion:
109-
110-
```
111-
ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0047A/4496763.zarr 4496763.zarr --output-read-details params_4496763.json
112-
```
113-
114-
<hr>
115-
116-
`9822152.zarr` was created with ome2024-ngff-challenge commit `f17a6de963`.
117-
118-
The chunks and shard shapes are specified to be the same for all resolution
119-
levels. This is required since the smaller resolution levels of the source image
120-
at
121-
https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0083A/9822152.zarr
122-
have chunks that correspond to the resolution shape, e,g, `1,1,1,91,141` and
123-
this will fail to convert using a shard shape of `1,1,1,4096,4096`.
124-
125-
Took 34 minutes to run conversion with this command:
126-
127-
```
128-
$ ome2024-ngff-challenge --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon zarr/v0.4/idr0083A/9822152.zarr 9822152.zarr --output-shards=1,1,1,4096,4096 --output-chunks=1,1,1,1024,1024 --log debug
129-
```
130-
131-
<hr>
132-
133-
Took 9 hours to run this conversion (before multi-threading changes):
134-
135-
```
136-
$ ome2024-ngff-challenge 9846151.zarr/0 will/9846151_2D_chunks_3.zarr --output-shards=1,1,1,4096,4096 --output-chunks=1,1,1,1024,1024 --log debug
137-
```
138-
139-
<hr>
140-
141-
Plate conversion, took 19 minutes, choosing a shard size that contained a whole
142-
image. Image shape is `1,3,1,1024,1280`.
143-
144-
```
145-
$ ome2024-ngff-challenge --input-bucket=bia-integrator-data --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon S-BIAD847/0762bf96-4f01-454d-9b13-5c8438ea384f/0762bf96-4f01-454d-9b13-5c8438ea384f.zarr /data/will/idr0035/Week9_090907.zarr --output-shards=1,3,1,1024,2048 --output-chunks=1,1,1,1024,1024 --log debug
146-
```
147-
148-
</details>
149-
150-
## CLI Commands
151-
152-
### `resave`: convert your data
153-
154-
The `ome2024-ngff-challenge` tool can be used to convert an OME-Zarr 0.4 dataset
155-
that is based on Zarr v2. The input data will **not be modified** in any way and
156-
a full copy of the data will be created at the chosen location.
157-
158-
#### Getting started
159-
160-
```
161-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr
162-
```
163-
164-
is the most basic invocation of the tool. If you do not choose a license, the
165-
application will fail with:
166-
167-
```
168-
No license set. Choose one of the Creative Commons license (e.g., `--cc-by`) or skip RO-Crate creation (`--rocrate-skip`)
169-
```
170-
171-
#### Licenses
172-
173-
There are several license options to choose from. We suggest one of:
174-
175-
- `--cc-by`: credit must be given to the creator
176-
- `--cc0`: Add your data to the public domain
177-
178-
Alternatively, you can choose your own license, e.g.,
179-
180-
`--rocrate-license=https://creativecommons.org/licenses/by-nc/4.0/`
181-
182-
to restrict commercial use of your data. Additionally, you can disable metadata
183-
collection at all.
184-
185-
**Note:** you will need to add metadata later for your dataset to be considered
186-
valid.
187-
188-
#### Metadata
189-
190-
There are four additional fields of metadata that are being collected for the
191-
challenge:
192-
193-
- organism and modality: RECOMMENDED
194-
- name and description: SUGGESTED
195-
196-
These can be set via the properties prefixed with `--rocrate-` since they will
197-
be stored in the standard [RO-Crate](https://w3id.org/ro/crate/) JSON file
198-
(`./ro-crate-metadata.json`) at the top-level of the Zarr dataset.
199-
200-
```
201-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-organism=NCBI:txid9606 # Human
202-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-modality=obo:FBbi_00000369 # SPIM
203-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-name="short name of dataset"
204-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --rocrate-description="and a longer description"
205-
```
206-
207-
For other examples including several other NCBI and FBbi terms, please see:
208-
209-
```
210-
ome2024-ngff-challenge resave --help
211-
```
212-
213-
#### Re-running the script
214-
215-
If you would like to re-run the script with different parameters, you can
216-
additionally set `--output-overwrite` to ignore a previous conversion:
217-
218-
```
219-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-overwrite
220-
```
221-
222-
#### Writing in parallel
223-
224-
By default, 16 chunks of data will be processed simultaneously in order to bound
225-
memory usage. You can increase this number based on your local resources:
226-
227-
```
228-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-threads=128
229-
```
230-
231-
#### Reading/writing remotely
232-
233-
If you would like to avoid downloading and/or upload the Zarr datasets, you can
234-
set S3 parameters on the command-line which will then treat the input and/or
235-
output datasets as a prefix within an S3 bucket:
236-
237-
```
238-
ome2024-ngff-challenge resave --cc-by \
239-
--input-bucket=BUCKET \
240-
--input-endpoint=HOST \
241-
--input-anon \
242-
input.zarr \
243-
output.zarr
244-
```
245-
246-
A small example you can try yourself:
247-
248-
```
249-
ome2024-ngff-challenge resave --cc-by \
250-
--input-bucket=idr \
251-
--input-endpoint=https://uk1s3.embassy.ebi.ac.uk \
252-
--input-anon \
253-
zarr/v0.4/idr0062A/6001240.zarr \
254-
/tmp/6001240.zarr
255-
```
256-
257-
#### Reading/writing via a script
258-
259-
Another R/W option is to have `resave.py` generate a script which you can
260-
execute later. If you pass `--output-script`, then rather than generate the
261-
arrays immediately, a file named `convert.sh` will be created which can be
262-
executed later.
263-
264-
For example, running:
265-
266-
```
267-
ome2024-ngff-challenge resave --cc-by dev2/input.zarr /tmp/scripts.zarr --output-script
268-
```
269-
270-
produces a dataset with one `zarr.json` file and 3 `convert.sh` scripts:
271-
272-
```
273-
/tmp/scripts.zarr/0/convert.sh
274-
/tmp/scripts.zarr/1/convert.sh
275-
/tmp/scripts.zarr/2/convert.sh
276-
```
277-
278-
Each of the scripts contains a statement of the form:
279-
280-
```
281-
zarrs_reencode --chunk-shape 1,1,275,271 --shard-shape 2,236,275,271 --dimension-names c,z,y,x --validate dev2/input.zarr /tmp/scripts.zarr
282-
```
283-
284-
Running this script will require having installed `zarrs_tools` with:
285-
286-
```
287-
cargo install zarrs_tools
288-
export PATH=$PATH:$HOME/.cargo/bin
289-
```
290-
291-
#### Optimizing chunks and shards
292-
293-
Zarr v3 supports shards, which are files that contain multiple chunks. The shape
294-
of a shard must be a multiple of the chunk size in every dimension. There is not
295-
yet a single heuristic for determining the chunk and shard sizes that will work
296-
for all data. **The default shard shape chosen by resave is the full shape of
297-
the image array.**
298-
299-
In order to limit the size of a shard, if the shard exceeds 100,000,000 pixels
300-
then you must specify the shard-shape. You can specify the shard shape, using
301-
--output-shards, which will be used for all pyramid resolutions. This may cause
302-
issues if the chunk shape changes for lower resolutions (to match the smaller
303-
image shape). In this case, you should also specify the chunk-shape to be used
304-
for all resolutions:
305-
306-
```
307-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-chunks=1,1,1,256,256 --output-shards=1,1,1,2048,2048
308-
```
309-
310-
Alternatively, you can use a JSON file to review and manually optimize the
311-
chunking and sharding parameters on a per-resolution basis:
312-
313-
```
314-
ome2024-ngff-challenge resave --cc-by input.zarr parameters.json --output-write-details
315-
```
316-
317-
This will write a JSON file of the form:
318-
319-
```
320-
[{"shape": [...], "chunks": [...], "shards": [...]}, ...
321-
```
322-
323-
where the order of the dictionaries matches the order of the "datasets" field in
324-
the "multiscales". Edits to this file can be read back in using the
325-
`output-read-details` flag:
326-
327-
```
328-
ome2024-ngff-challenge resave --cc-by input.zarr output.zarr --output-read-details=parameters.json
329-
```
330-
331-
Note: Changes to the shape are ignored.
332-
333-
#### More information
334-
335-
See `ome2024-ngff-challenge resave -h` for more arguments and examples.
336-
337-
### `lookup`: finding ontology terms (WIP)
338-
339-
The `ome2024-ngff-challenge` tool can also be used to look up terms from the EBI
340-
OLS for setting metadata fields like `--rocrate-modality` and
341-
`--rocrate-organism`:
342-
343-
```
344-
ome2024-ngff-challenge lookup "homo sapiens"
345-
ONTOLOGY TERM LABEL DESCRIPTION
346-
ncbitaxon NCBITaxon_9606 Homo sapiens
347-
vto VTO_0011993 Homo sapiens
348-
snomed SNOMED_337915000 Homo sapiens
349-
...
350-
```
351-
352-
## Related work
353-
354-
The following additional PRs are required to work with the data created by the
355-
scripts in this repository:
356-
357-
- https://github.com/ome/ome-ngff-validator/pull/36
358-
- https://github.com/ome/ome-zarr-py/pull/383
359-
- https://github.com/hms-dbmi/vizarr/pull/172
360-
- https://github.com/LDeakin/zarrs_tools/issues/8
361-
362-
Slightly less related but important at the moment:
363-
364-
- https://github.com/google/neuroglancer/issues/606
365-
- https://github.com/ome/napari-ome-zarr/pull/112
366-
- https://github.com/zarr-developers/zarr-python/issues/2029
21+
The data uploaded for the challenge can be browsed and searched at
22+
[https://nfdi4bioimage.github.io/n4bi-repo-challenge-2025/](https://nfdi4bioimage.github.io/n4bi-repo-challenge-2025/).

0 commit comments

Comments
 (0)