Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions tools/image_processing/bia-ftplinks/.shed.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ owner: bgruening
categories:
- Imaging
description: Tool to query ftp links for study from bioimage archive
long_description: |
Tool to query ftp links for study from bioimage archive.
homepage_url: https://www.ebi.ac.uk/biostudies/bioimages/studies
remote_repository_url: https://github.com/bgruening/galaxytools/tree/master/tools
type: unrestricted
auto_tool_repositories:
Expand Down
56 changes: 29 additions & 27 deletions tools/image_processing/bia-ftplinks/biaftplink.xml
Original file line number Diff line number Diff line change
@@ -1,53 +1,55 @@
<tool id="bia_download" name="FTP Link for Bioimage Archive" version="@VERSION@+galaxy0" profile="22.05">
<description>Download images from Bioimage Archive</description>
<description>Download images (TIFF) from Bioimage Archive</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="aggressive">
<![CDATA[
wget -r 'ftp://ftp.ebi.ac.uk/biostudies/$mode/$path'/Files;
#if '$ftp_output'
#set study = $path.split('/')[-1].rstrip('/')
curl https://www.ebi.ac.uk/biostudies/api/v1/studies/$study/info -s |jq -r .ftpLink >>ftpLink.txt
#end if
curl -s https://www.ebi.ac.uk/biostudies/api/v1/studies/$accession/info | jq -r .ftpLink >> ftpLink.txt &&
wget -q -r -l 0 -i ftpLink.txt &&
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure that we only get tif files?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the whole ftp directory of that accession.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no particular directory structure. So I download everything and filter for tif later.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not filter this file first, instead of downloading everything and throwing that away later?


find . -type f -name "*.zip" | while read zip_file; do
unzip -o "\$zip_file" -d "\$(dirname "\$zip_file")" &> /dev/null;
done

]]>
</command>
<inputs>
<param name="mode" type="text" label="Storage mode" help="The storage mode, can be either nfs or fire."/>
<param name="path" type="text" label="The path of accession. e.g. S-BIAD/570/S-BIAD570 "/>
<param name="accession" type="text" label="The accession ID of BioImages-Core or BioStudies-JCB" help="for eg. S-BIAD570, S-JCBD-201309038"/>
<param name="ftplink_output" type="boolean" label="Generate FTP links?" help="If set, a file containing FTP links associated with the accession will be generated." />
</inputs>
<outputs>
<data name="images" format="tiff">
<discover_datasets pattern="__name_and_ext__" format="tif,tiff" directory="ftp.ebi.ac.uk" visible="true" recurse="true" />
</data>
<data format="txt" name="ftplinks" from_work_dir="ftpLink.txt" label="FTP Links">
<collection name="images" type="list" label="${tool.name} on ${on_string}: TIFF Images">
<discover_datasets pattern="(?P&lt;designation&gt;.*)\.tif" format="tif" directory="ftp.ebi.ac.uk" recurse="true" />
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the format is tiff or ome.tiff ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the UI filtering options, it shows only tif as an option. I kept as it is.

Btw, there are also several other image formats like lif, lsm, mrc and so on. I skipped them because there is no Galaxy datatype.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the UI. The question is if the downloaded files are proper OME.tiff files. https://github.com/galaxyproject/galaxy/blob/5c686c62ecb5c23fa31ebae30db4f5a56c63630a/lib/galaxy/datatypes/images.py#L309

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to check that?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</collection>
<data format="txt" name="ftplinks" from_work_dir="ftpLink.txt" label="${tool.name} on ${on_string}: FTP Links">
<filter>ftplink_output</filter>
</data>
</outputs>
<tests>
<test expect_num_outputs='1'>
<param name="mode" value="fire" />
<param name="path" value="S-BIAD/961/S-BIAD961" />
<param name="accession" value="S-BIAD961" />
<param name="ftplink_output" value="False" />
<output name="images">
<discovered_dataset designation="Study_Component-4_mznanog_mCherry-AAT" ftype="tif">
<output_collection name="images">
<element name="Study_Component-4_mznanog_mCherry-AAT">
<assert_contents><has_size value="14092624" /></assert_contents>
</discovered_dataset>
</output>
</element>
</output_collection>
</test>
<test expect_num_outputs='2'>
<param name="mode" value="fire" />
<param name="path" value="S-BIAD/961/S-BIAD961" />
<param name="accession" value="S-JCBD-201309038" />
<param name="ftplink_output" value="True" />
<output name="images">
<discovered_dataset designation="Study_Component-4_mznanog_mCherry-AAT" ftype="tif">
<assert_contents><has_size value="14092624" /></assert_contents>
</discovered_dataset>
</output>
<output name="ftplinks" ftype="txt" file="ftpLink.txt" lines_diff="0" />
</test>
<output_collection name="images">
<element name="JCB_STIL_serial">
<assert_contents><has_size value="7446240" /></assert_contents>
</element>
<element name="Sir_JCB_STIL_serial">
<assert_contents><has_size value="7436060" /></assert_contents>
</element>
</output_collection>
<output name="ftplinks" ftype="txt" file="ftpLink.txt" lines_diff="0" />
</test>
</tests>
<help>
<![CDATA[
Expand Down
10 changes: 5 additions & 5 deletions tools/image_processing/bia-ftplinks/macros.xml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
<macros>
<token name="@VERSION@">0.1.0</token>
<token name="@VERSION@">0.2.0</token>
<xml name="requirements">
<requirements>
<requirement type="package" version="1.20.3">wget</requirement>
<requirement type="package" version="8.4.0">curl</requirement>
<requirement type="package" version="1.6">jq</requirement>
<requirement type="package" version="1.21.4">wget</requirement>
<requirement type="package" version="8.12.1">curl</requirement>
<requirement type="package" version="1.7.1">jq</requirement>
<yield />
</requirements>
</xml>
<xml name="citations">
<citations>
<citation type="bibtex">
@misc{bia,,
title = "BioImage Archive Downloading via ftp",
title = "BioImage Archive Downloading via ftp",
note = "https://www.ebi.ac.uk/bioimage-archive/help-download/",
url = "https://www.ebi.ac.uk/bioimage-archive/help-download/"}</citation>
</citations>
Expand Down
2 changes: 1 addition & 1 deletion tools/image_processing/bia-ftplinks/test-data/ftpLink.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ftp://ftp.ebi.ac.uk/biostudies/fire/S-BIAD/961/S-BIAD961
ftp://ftp.ebi.ac.uk/biostudies/fire/S-JCBD-/S-JCBD-xxx038/S-JCBD-201309038