Description
NGFF generation
Generation takes place on pilot-zarr1-dev
or pilot-zarr2-dev
machines.
We need to generate NGFF data with https://github.com/IDR/bioformats2raw/releases/tag/v0.6.0-24 which has ZarrReader fixes, including those required for .pattern
file data.
Install bioformats2raw via conda:
conda create -n bioformats2raw python=3.9
conda activate bioformats2raw
conda install -c ome bioformats2raw
This is actually just for getting the dependencies installed. Get the actual bioformats2raw from the link above and just unzip it into your home directory.
We need to generate NGFF Filesets under /data
volume.
Create a directory for the idr project and memo files (if it’s not already there), and change into the idr directory. For example for idr0051:
cd /data
sudo mkdir idr0051
sudo chown yourname idr0051
sudo mkdir memo
sudo chown yourname memo
cd idr0051
Find out where the pattern, screen or companion files are. For example: /nfs/bioimage/drop/idr0051-fulton-tailbudlightsheet/patterns/
Then run the conversion (using the bioformat2raw from above) in a screen
(long running):
NB: it may be useful to convert a single Fileset to zarr initially to determine the size of this on disk and to tell whether you have enough space to convert all the others at once.
If not, might have to do a smaller number, zip and upload to BioStudies before deleting to make space available.
NB: please make sure that the --memo-directory
specified here is writable by you.
screen -S idr0051ngff
for i in `ls /nfs/bioimage/drop/idr0051-fulton-tailbudlightsheet/patterns/`; do echo $i; ~/bioformats2raw-0.6.0-24/bin/bioformats2raw --memo-directory ../memo /nfs/bioimage/drop/idr0051-fulton-tailbudlightsheet/patterns/$i ${i%.*}.ome.zarr; done
($i
is the pattern file, ${i%.*}.ome.zarr
strips the .pattern file extension and adds .ome.zarr
; this should work for pattern, screen and also companion file extensions)
Upload to EBI s3 for testing
Upload 1 or 2 Plates or Images to EBI's s3, so we can validate that the data can be viewed and imported on s3.
Create a bucket from local aws
install:
Once installed aws
just do aws configure
and enter Access key and Secret key - use defaults for other options.
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0010
make_bucket: idr0010
And update policy and CORS config as at https://github.com/IDR/deployment/blob/master/docs/object-store.md#policy (NB: replace idr0000 with e.g. idr0010 in the sample config etc)
Upload the data using mc
, installed on dev
servers where data is generated:
$ ssh pilot-zarr1-dev
$ wget https://dl.min.io/client/mc/release/linux-amd64/mc
$ ./mc config host add uk1s3 https://uk1s3.embassy.ebi.ac.uk
Enter Access Key: X8GE11ZK************
Enter Secret Key:
Added `uk1s3` successfully.
$ /home/wmoore/mc cp -r idr0010/ uk1s3/idr0010/zarr
You should now be able to view and do some validation of the data with ome-ngff-validator
and vizarr
.
E.g.
https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0025/zarr/10x+images+plate+3.ome.zarr
Submission to BioStudies
Once the NGFF data has been validated to your satisfaction, we can upload to BioStudies.
We need to create a .zip
file for each .ome.zarr
Fileset.
It can be useful where space is short to use -m
to move files into the zip and delete the original.
For a single zarr, this looks like $ zip -mr image.ome.zarr.zip image.ome.zarr
.
Convert all the zarr Filesets for a study:
E.g:
screen -S idr0010_zip
cd idr0010
for i in */; do zip -mr "${i%/}.zip" "$i"; done
This will create zips in the same dir as the zarrs, but we want a directory that contains just the zips for upload...
mkdir idr0010
mv *.zip idr0010/
Upload via Aspera, using the "secret directory".
Login to BioStudies with the IDR account.
Click on the FTP/Aspera
button at https://www.ebi.ac.uk/biostudies/submissions/files
# install...
$ wget https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/OSA/08q6g/0/ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
$ chmod +x ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
$ bash ibm-aspera-cli-3.9.6.1467.159c5b1-linux-64-release.sh
$ cd .aspera/cli/bin
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /path/to/idr00xx [email protected]:xx/xxxxxxxxxxxxxxxxxxxxxxx
Some JavaScript you can run in browser console to get the file names in the submission table:
let names = [];
[].forEach.call(document.querySelectorAll("div [role='row'] .ag-cell[col-id='name']"), function(div) {
names.push(div.innerHTML.trim());
});
console.log(names.join("\n"));
console.log(names.length);
Create a tsv
file that lists all the filesets for the submission with the first column named Files
. See https://www.ebi.ac.uk/bioimage-archive/help-file-list/
E.g. idr0054_files.tsv
:
Files
idr0054/Tonsil 1.ome.zarr.zip
idr0054/Tonsil 2.ome.zarr.zip
idr0054/Tonsil 3.ome.zarr.zip
Upload this to the same location as above (via FTP or using the web UI).
This is used to specify which files to be used in the submission.
You should be able to see all the uploaded files at https://www.ebi.ac.uk/biostudies/submissions/files
Create a new submission at https://www.ebi.ac.uk/biostudies/submissions/
- TBD: Name submission
idr00xx NGFF...
- Check for existing submission for this study (with raw data). Existing IDR studies can be found with https://www.ebi.ac.uk/biostudies/BioImages/studies?facet.link_type=image+data+resource.
- Add links from this submission:
- To the IDR itself - using
link_type: image data resource
- To the existing BIA submission if it exists (link_type?)
- To the IDR itself - using
- The
idr00XX_files.tsv
file list created above can be added to the submission under theStudy Component
section, which is at the bottom of the submission form.
Once submitted, we need to ask EBI to process the submission, unzip each zarr and upload data to s3
BioStudies will assign a uuid to each.
They will provide a mapping from each zip file to uuid.zarr as csv:
Spreadsheet for keeping track of the submissions status:
https://docs.google.com/spreadsheets/d/1P3dn-uL9KzE9O7XAKhpL8fUMTG3LWedMgjzSdnfAjQ4/edit#gid=0
Tonsil 2.ome.zarr.zip, https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD704/36cb5355-5134-4bdc-bde6-4e693055a8f9/36cb5355-5134-4bdc-bde6-4e693055a8f9.zarr/0
Tonsil 1.ome.zarr.zip, https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD704/5583fe0a-bbe6-4408-ab96-756e8e96af55/5583fe0a-bbe6-4408-ab96-756e8e96af55.zarr/0
Tonsil 3.ome.zarr.zip, https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD704/3b4a8721-1a28-4bc4-8443-9b6e145efbe9/3b4a8721-1a28-4bc4-8443-9b6e145efbe9.zarr/0
This needs to be used to create the necessary symlinks below.
If not already done, mount the bia-integrator-data
bucket on the server machine and check to see if files are available:
$ sudo mkdir /bia-integrator-data && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
$ ls /bia-integrator-data/S-BIAD704
36cb5355-5134-4bdc-bde6-4e693055a8f9 3b4a8721-1a28-4bc4-8443-9b6e145efbe9 5583fe0a-bbe6-4408-ab96-756e8e96af55
Make NGFF Filesets
Work In progress
Use https://github.com/joshmoore/omero-mkngff to create filesets based on the mounted s3 NGFF Filesets.
See IDR/idr-utils#56 as a script for generating inputs required for omero-mkngff
.
conda create -n mkngff -c conda-forge -c ome omero-py bioformats2raw
conda activate mkngff
pip install 'omero-mkngff @ git+https://github.com/joshmoore/omero-mkngff@main'
omero login demo@localhost
omero mkngff setup > setup.sql
omero mkgnff sql --secret=$SECRET 5287125 a.ome.zarr/ > my.sql
sudo -u postgres psql idr < setup.sql
sudo -u postgres psql idr < my.sql
sudo -u omero-server mkdir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-2/2023-06/22/12-46-39.975_converted/
mv a.ome.zarr /tmp
ln -s /tmp/a.ome.zarr /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-2/2023-06/22/12-46-39.975_converted/a.ome.zarr
omero render test Image:14834721 # Failing here
Validation
See IDR/idr-utils#55
Checkout that branch of idr-utils
(if not merged yet etc).
The script there allows us to check the pixel data for the lowest resolution of each image in a study, validating that each plane is identical to the corresponding one in IDR.
This could take a while, so lets run as a screen...
sudo -u omero-server -s
screen -S idr0012_check_pixels
source /opt/omero/server/venv3/bin/activate
omero login demo@localhost
cd /uod/idr/metadata/idr-utils/scripts
python check_pixels.py Plate:4299 /tmp/check_pixels_idr0012.log
Archived workflow below
The sections below were using a previous workflow (prior to the omero-mkngff
approach)
Make a metadata-only copy of the data
Since we want to import NGFF data without chunks, we need to create a copy of the data without chunks for import. The easiest way to do this is to use aws
to sync the data, ignoring chunks.
We want these to be owned by omero-server
user in a location they can access, so they can be imported. Location at import time isn't too important.
$ screen -S idr0010_aws_sync # can take a while if lots of data
$ mkdir idr0010
$ cd idr0010
$ aws s3 sync --no-sign-request --exclude '*' --include "*/.z*" --include "*.xml" --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3://idr0010/zarr .
$ sudo mv -f ./* /ngff/idr0010/
$ cd /ngff/
$ sudo chown -R omero-server idr0010/
Import metadata-only data
We can now perform a regular import as usual. Use a for loop to iterate through each plate in the directory instead of creating bulk import config, using name
(removing .ome.zarr
or .zarr
for e.g. idr0036) so that data isn't named METADATA.ome.xml and Plate names match the original data. Could also add a target Screen or Dataset target (not shown) or move into container with webclient UI after import:
sudo -u omero-server -s
screen -S idr0010_ngff
source /opt/omero/server/venv3/bin/activate
export OMERODIR=/opt/omero/server/OMERO.server
omero login demo@localhost
cd /ngff/idr0010
for dir in *; do
omero import --transfer=ln_s --depth=100 --name=${dir/.ome.zarr/} --skip=all $dir --file /tmp/$dir.log --errs /tmp/$dir.err;
done
Update symlinks
Mount the s3 bucket on IDR server machine: (idr0125-pilot or idr0138-pilot)
sudo mkdir /idr0010 && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0010 /idr0010
See IDR/idr-utils#54
Checkout that branch of idr-utils
(if not merged yet etc).
We need to specify the container (e.g. Screen, Plate, Dataset, Image or Fileset) and the path where the data is mounted:
If the path to the data in each Fileset is e.g. filesetPrefix/plate1.zarr/..
and the path to each mounted plate is e.g. /path/to/plates/plate1.zarr
we can run the following command to create 1 symlink for each plate from /ManagedRepository/filesetPrefix/plate1.zarr
to /path/to/plates/plate1.zarr
The script also renders a single Image from each Fileset before updating symlinks, which avoids subsequent ResouceErrors.
The script can be run repeatedly on the same data without issue, e.g. if it fails part-way through and needs a re-run to complete.
A --repo
option with default value is /data/OMERO/ManagedRepository
.
Can also use --dry-run
and --report
options:
$ sudo -u omero-server -s
$ source /opt/omero/server/venv3/bin/activate
$ omero login demo@localhost
$ python idr-utils/scripts/managed_repo_symlinks.py Screen:123 /path/to/plates/ --report
Fileset: 5286929 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2023-04/25/13-53-43.777/
fs_contents ['10-34.ome.zarr']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2023-04/25/13-53-43.777/10-34.ome.zarr to /idr0010/zarr/10-34.ome.zarr
...
Swap Filesets
See IDR/idr-utils#53
Checkout that branch of idr-utils
(if not merged yet etc).
The first Object (Screen, Plate, Image, Fileset) is the original data that we want to update to use NGFF Fileset, and the second is the NGFF data we imported above. In the case of Screens, Filesets are swapped between pairs of Plates matched by name (you should check that Plate names match before running this script).
The 3rd required argument is a file where you can write the sql commands that are required to update Pixels objects (we can't yet update these via the OMERO API).
The script supports --dry-run
and --report
flags.
$ source /opt/omero/server/venv3/bin/activate
$ omero login demo@localhost
$ python idr-utils/scripts/swap_filesets.py Screen:1202 Screen:3204 /tmp/idr0012_filesetswap.sql --report
This will write a psql command for each Fileset that we then need to execute...
$ export OMERODIR=/opt/omero/server/OMERO.server
$ omero config get --show-password
# Use the password, host etc to run the sql file generated above...
$ PGPASSWORD=****** psql -U omero -d idr -h 192.168.10.102 -f /tmp/idr0012_filesetswap.sql
psql commands are 1 per Fileset and are like:
UPDATE pixels SET name = '.zattrs', path = 'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2023-04/12/10-20-20.483/10x_images_plate_2.ome.zarr' where image in (select id from Image where fileset = 5286921);
You can then view Images from the original data which is now using an NGFF Fileset!
Cleanup
We can now delete the uk1s3 data and buckets created above for testing.
The original Filesets will remain as "orphans".
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
convert all data to NGFF
Activity