Cami2-assembly-tutorial #3293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

PlushZ wants to merge 19 commits into galaxyproject:main from PlushZ:cami2-assembly

Contributor

PlushZ commented Mar 27, 2022 •

edited

Loading

This PR is to add a tutorial about my Master Project "Reproducing Critical Assessment of Metagenome Interpretation assembly challenge on marine dataset with Galaxy" into training.galaxyproject

PlushZ requested a review from a team as a code owner

March 27, 2022 09:28


          Add new contributor

348b65f

hexylena requested changes

View reviewed changes

Member

hexylena left a comment

Welcome @PlushZ ! I've made a number of comments on the formatting of the tutorial to help it conform to GTN standards for tutorials.

This is super cool to see! I always thought the assembly challenges were a good fit for galaxy and reproducing results there.

CONTRIBUTORS.yaml Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.bib Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.bib Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated Show resolved Hide resolved

PlushZ added 2 commits

April 7, 2022 15:00


          fix according to comments and suggestions

ff4e5eb


          fix lint error, add Rule Builder hands-on

8468cea

hexylena added the new tutorial label

hexylena requested a review from bebatut

July 4, 2022 08:11

hexylena added the work-in-progress label


          Update topics/metagenomics/tutorials/cami2-assembly/tutorial.md

e4756ea

Co-authored-by: Helena <[email protected]>

github-actions bot added metagenomics template-and-tools labels

PlushZ and others added 5 commits

November 30, 2022 15:42


          Update topics/metagenomics/tutorials/cami2-assembly/tutorial.md

ff9532a

Co-authored-by: Helena <[email protected]>


          Update topics/metagenomics/tutorials/cami2-assembly/tutorial.md

a7e4942

Co-authored-by: Helena <[email protected]>


          fix according to comments for assembly tutorial


          change location of images

b86dcba


          Merge branch 'galaxyproject:main' into cami2-assembly

2a8bad2

github-actions bot removed the template-and-tools label

bebatut force-pushed the cami2-assembly branch 2 times, most recently from 410b355 to bb3608b Compare

December 5, 2022 15:19


          Edit the tutorial, add citations

c0e352f

bebatut force-pushed the cami2-assembly branch from bb3608b to c0e352f Compare

December 6, 2022 13:16

bebatut reviewed

View reviewed changes

topics/metagenomics/tutorials/cami2-assembly/tutorial.md

+              >
+              >     ```text
+              >     SampleID	URL
+              >     Long read sample 0	https://frl.publisso.de/data/frl:6425521/marine/long_read/marmgCAMI2_sample_0_reads.tar.gz

Member

bebatut Dec 6, 2022

These links do not work: it is tar.gz with subfolders in. We will put the data in the shared data library

topics/metagenomics/tutorials/cami2-assembly/tutorial.md


		Based on {% cite meyer2022critical %}, {% cite Meyer2021 %} and {% cite Meyer2021_tutorial %}, we can compare tools on a set of metrics to select the one to use for an analysis but also here to run the challenge:

		Tool \| Genome fraction (%) \| Mismatches per 100 kbp \| Misassemblies \| NGA50 \| Strain recall \| Strain precision

Member

bebatut Dec 6, 2022

It would be good to add before a short explanation of the different columns

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated

+              ABySS ({% cite jackman2017abyss %}) | Very accurate mean fraction (<1% divergence) |  | The fewest misassemblies | 100% precision. The highest strain precision (100%) for the unique genome
+              Ray ({% cite Boisvert2012 %}) |  |  |  |  |  | 100% precision. The highest strain precision (100%) for the unique genome
+              A-STAR | A-STAR excelled in terms of genome fraction on marine and strain madness data sets. A-STAR improved the genome fraction to 44.1% on the marine dataset. On marine common genomes, A-STAR (26.7%) achieved the highest genome fractions. For unique genome, A-STAR provided the most complete assemblies (55.3% genome fraction). A-STAR partially recovered 102 (78%) of 131 16S gold standard sequences. | More mismatches than others: 773/100 kb | More misassemblies than others | | 2nd highest: 7.5% recall  | 2nd highest: 69.4% precision
+              OPERA-MS [25] | There were selected 50 unique, public genomes present as a single contig in the gold standard and with annotated 16S sequences. The hybrid assembler OPERA-MS recovered one of the most complete 16S sequences (mean recovered gene fraction 47.1%). For the unique genome, OPERA-MS has an exceptional average NGA50 (187,083, 75% of the gold standard NGA50). | | The most contiguous assemblies were provided by the hybrid assembler OPERA-MS for the marine data, with an average NGA50 of 28,244 across genomes. |

Member

bebatut Dec 6, 2022

Please add the correct citations to the tool

topics/metagenomics/tutorials/cami2-assembly/tutorial.md

+              Gold Standard Assembly (GSA) | 76.9 | 0 | 0 | 682,777 | 54.9% (upper bound) | 100
+              ABySS ({% cite jackman2017abyss %}) | Very accurate mean fraction (<1% divergence) |  | The fewest misassemblies | 100% precision. The highest strain precision (100%) for the unique genome
+              Ray ({% cite Boisvert2012 %}) |  |  |  |  |  | 100% precision. The highest strain precision (100%) for the unique genome
+              A-STAR | A-STAR excelled in terms of genome fraction on marine and strain madness data sets. A-STAR improved the genome fraction to 44.1% on the marine dataset. On marine common genomes, A-STAR (26.7%) achieved the highest genome fractions. For unique genome, A-STAR provided the most complete assemblies (55.3% genome fraction). A-STAR partially recovered 102 (78%) of 131 16S gold standard sequences. | More mismatches than others: 773/100 kb | More misassemblies than others | | 2nd highest: 7.5% recall  | 2nd highest: 69.4% precision

Member

bebatut Dec 6, 2022

Could you simply the content of the cells in this table and the ones in the detail box?

Contributor Author

PlushZ Dec 6, 2022

merge?

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated


		Using the described metrics, the different tools were evaluated in the CAMI paper and aggregated in tables (Supplementary Tables 3-7) from {%cite tutorialMeyer2021%}.

		In these tables there are also ranking scores of the tools shown for every statistic as well as overall ranking scores. Overall, ranking scores for every dataset are computed as a sum of all ranking scores across metrics. The average ranking score of both datasets are calculated as weighted average sum of ranking for both datasets. We created [a table showing all ranking results from previous tables](https://docs.google.com/spreadsheets/d/e/2PACX-1vQgJr3J-IyVy9IkXS9W-RZcV83Tr6f7RusG_97QwgpW2dFdCXUMroROIhy8gKjPcUgISFXW9NQwOzzK/pubhtml?gid=455354696)

Member

bebatut Dec 6, 2022

Could you add the table directly in the tutorial? Thanks

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated

Comment on lines 639 to 677

+              **Marine** dataset
+              _With tool versions_
+. HipMer
+. metaSPAdes_v3.13.1
+. metaSPAdes_v3.13.0
+. ABySS
+. Ray-Meta
+. Megahit_v1.1.2
+. SPAdes_v3.14-dev
+              _Without tool versions_
+. HipMer
+. **metaSPAdes**
+. **ABySS**
+. **Ray-Meta**
+. **Megahit**
+              **Strain madness** dataset
+              _With tool versions_
+. HipMer
+. Megahit_v1.1.2
+. SPAdes_v3.14-dev
+. OPERA-MS
+. Megahit_V1.2.7
+              _Without tool versions_
+. HipMer
+. **Megahit**
+. **SPAdes**
+. **OPERA**
+              **Plant-associated** dataset
+              There are no certain ranking tables among Supplementary tables for plant-associated dataset. However,  in {%cite Meyer2021%} there is information related to tools performance on plant-associated dataset. We created the priority list of tools for the plant-associated dataset.
+. (Meta)HipMer
+. **(meta)Flye**
+. **(meta)SPAdes**

Member

bebatut Dec 6, 2022

It would be nice to show that maybe as a table

topics/metagenomics/tutorials/cami2-assembly/tutorial.md Outdated

+. **(meta)Flye**
+. **(meta)SPAdes**
+              Since in this tutorial we have decided to focus on marine dataset it would be reasonable to reproduce CAMI2 assembly challenge using HipMer, metaSPAdes, ABySS, Ray-Meta, Megahit assemblers which performed better. As we know from our [comparison Galaxy and CAMI2 analysis](https://docs.google.com/spreadsheets/d/e/2PACX-1vQgJr3J-IyVy9IkXS9W-RZcV83Tr6f7RusG_97QwgpW2dFdCXUMroROIhy8gKjPcUgISFXW9NQwOzzK/pubhtml), metaSPAdes, ABySS, Megahit tools are available in Galaxy while Ray-Meta and HipMer are not.

Member

bebatut Dec 6, 2022

Reference to a table above

PlushZ added 2 commits

December 6, 2022 19:54


          Merge branch 'galaxyproject:main' into cami2-assembly

c07c488


          work on some comments

2df976d

github-actions bot added the template-and-tools label


          Continue the tutorial edition

2e0b90c

bebatut force-pushed the cami2-assembly branch from 79daae4 to 2e0b90c Compare

December 8, 2022 15:01

PlushZ and others added 3 commits

December 8, 2022 23:42


          add table for assessment tools

c38ce11


          add table with galaxy histories

b907de1


          Add hybrid assembly

99ffc63

bebatut added 3 commits

December 14, 2022 15:39


          Add hybrid assembly

41c29a8


          Merge branch 'main' of github.com:galaxyproject/training-material int…

9487c46

…o cami2-assembly


          Edit through the end

f7a5d3f

hexylena mentioned this pull request

Exempt specific workflows from automated testing #4171

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

metagenomics new tutorial template-and-tools work-in-progress