-
Notifications
You must be signed in to change notification settings - Fork 10
wrote draft integration of BLAST_MAKEBLASTDB and NCBIREFSEQDOWNLOAD into functional_annotation subworkflow. #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from 22 commits
53aa47c
55b7632
3c5b661
b1fdc74
9f1ea67
52506f7
49ee17c
6ab82f3
89fb03e
f12c619
4f4db82
3946ba5
94ebe1b
06bf5e8
7681f82
0cf27f7
92c847c
043451f
f59c0b7
cf259d3
9fd5d3f
fa390ce
4eb5ad1
71ff9ef
a4f00be
16e10b2
c970041
925ed89
7b8a6e1
1f5ba7a
8edc405
a3e661c
3fbcc6a
a4911fc
64ed6dc
556b3e3
f5b63c2
6cd3a20
3ffbc1a
0b1df66
2257719
7632328
c1f0c63
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| Jul-09 09:36:15.278 [main] INFO com.askimed.nf.test.App - nf-test 0.9.2 | ||
| Jul-09 09:36:15.294 [main] INFO com.askimed.nf.test.App - Arguments: [test, subworkflows/local/diamond/tests/main.nf.tests] | ||
| Jul-09 09:36:16.153 [main] INFO com.askimed.nf.test.App - Nextflow Version: 24.10.6 | ||
| Jul-09 09:36:16.155 [main] INFO com.askimed.nf.test.commands.RunTestsCommand - Load config from file /home/trace/projects/proteinannotator/nf-test.config... | ||
| Jul-09 09:36:16.663 [main] WARN com.askimed.nf.test.nextflow.NextflowScript - Module /home/trace/projects/proteinannotator/subworkflows/local/functional_annotation/main.nf: Dependency '/home/trace/projects/proteinannotator/subworkflows/local/functional_annotation/../../../modules/nf-core/blast/makeblastdb/main.nf' not found. | ||
| Jul-09 09:36:16.728 [main] INFO com.askimed.nf.test.lang.dependencies.DependencyResolver - Loaded 21 files from directory /home/trace/projects/proteinannotator in 0.081 sec | ||
| Jul-09 09:36:16.730 [main] INFO com.askimed.nf.test.lang.dependencies.DependencyResolver - Found 0 files containing tests. | ||
| Jul-09 09:36:16.730 [main] DEBUG com.askimed.nf.test.lang.dependencies.DependencyResolver - Found files: [] | ||
| Jul-09 09:36:16.732 [main] INFO com.askimed.nf.test.commands.RunTestsCommand - Found 0 tests to execute. |
olgabot marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,6 @@ | ||
| { | ||
| "markdown.styles": ["public/vscode_markdown.css"] | ||
| "markdown.styles": [ | ||
| "public/vscode_markdown.css" | ||
| ], | ||
| "nextflow.telemetry.enabled": true | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d | |
|
|
||
| - [Functional Annotation](#functional-annotation) Annotate proteins with functional domains | ||
| - [InterProScan](#Interproscan) - Search the InterPro database for functional domains | ||
| - [Diamond] (#Diamond) - Provide ‘hits’ of potential homologous protein matches between species | ||
tracelail marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline | ||
| - [SeqKit stats](#seqkit_stats) - Simple statistics for protein FASTA files | ||
| - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution | ||
|
|
@@ -75,7 +76,7 @@ AKRLERIETINREIIDMAGGAGSSNGTGGMLTKIKAATIATESGVPVYICS | |
|
|
||
| </details> | ||
|
|
||
| #### JavaScript Object Notation (JSON) Output | ||
| ##### JavaScript Object Notation (JSON) Output | ||
tracelail marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| JSON representation of the matches - an alternative to XML format. As new releases are made public, the changes to the expected JSON format are documented in [Change log for InterProScan JSON output format](https://interproscan-docs.readthedocs.io/en/v5/JSONOutputFormatHistory.html#change-log-for-interproscan-json-output-format). | ||
|
|
||
|
|
@@ -268,6 +269,115 @@ The XML Schema Definition (XSD) is available [here](http://ftp.ebi.ac.uk/pub/sof | |
|
|
||
| </details> | ||
|
|
||
| #### Diamond | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Output files</summary> | ||
|
|
||
| - `functional_annotation/diamond` | ||
| - `*.blast`: (Basic Local Alignment Search Tool) BLAST pairwise format | ||
| - `*.xml`: BLAST Extensible Markup Language (XML) format | ||
| - `*.txt`: BLAST tabular format (default). This format can be customized, the 6 may be followed by a space-separated list of the blast_columns keywords, each specifying a field of the output. | ||
| - `*.daa`: DIAMOND alignment archive (DAA). The DAA format is a proprietary binary format that can subsequently be used to generate other output formats using the view command. It is also supported by MEGAN and allows a quick import of results. | ||
| - `*.sam`: SAM format. | ||
| - `*.tsv`: Taxonomic classification. This format will not print alignments but only a taxonomic classification for each query using the LCA algorithm. | ||
| - `*.paf`: PAF format. The custom fields in the format are AS (bit score), ZR (raw score) and ZE (e-value) | ||
|
|
||
| </details> | ||
|
|
||
| [Diamond](https://github.com/bbuchfink/diamond) provides sensitive protein sequence alignment. The process provides ‘hits’ that are potential homologous protein matches between species, indicating a evolutionary relationship, derived by protein sequence similarity. | ||
|
|
||
| ##### Pairwise Alignment Format (.blast) Output | ||
|
|
||
| The pairwise BLAST format is a human readable format that is useful for visual inspection, if one desires to get full alignment details for individual alignments. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example Pairwise Alignment Format output</summary> | ||
|
|
||
| ``` | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm guessing these example outputs will be filled in, correct?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. My thought process was to provide the output from running the test once I have it working. |
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### BLAST Extensible Markup Language (XML) Output | ||
|
|
||
| XML (Extensible Markup Language) file has the same information as the pairwise file but is suited for bioinformatics software and scripts (machine readable), due to it’s structure and parsing of data. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example Extensible Markup Language (XML) output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### Text File (TXT) Output --default | ||
|
|
||
| The BLAST tabular format is the default output and the output columns can be modified depending on analysis needs. This format is much smaller than the other BLAST formats and compatible with most all forward processing and is easily filtered and analyzed. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example Text File (TXT) output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### DIAMOND Alignment Archive (DAA) Output | ||
|
|
||
| DIAMOND alignment archive (DAA) is a compressed proprietary binary format that is can be converted to any of the other output formats (.blast, .xml, .txt, .sam, .tsv, .paf) with the DIAMOND view command without rerunning the pipeline. It can also be used in some meta-genomic analysis software. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example DIAMOND Alignment Archive (DAA) output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### Sequence Alignment/Map (SAM) Output | ||
|
|
||
| The SAM (Sequence Alignment/Map) file adapts the DIAMOND protein alignment output in a similar fashion to the genomic alignment. This allows for easy integration into SAM/BAM pipelines and protein alignment visualization with IGV browser. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example Sequence Alignment/Map (SAM) output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### Tab-Separated Values (TSV) Output | ||
|
|
||
| The taxonomic classification (.tsv) output provides taxonomic composition and is useful for biological interpretation rather than alignment comparison. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example Tab-Separated Values (TSV) output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ##### Pairwise Mapping Format (PAF) | ||
|
|
||
| The PAF (Pairwise mApping Format) file that is originally used for long read sequencing. DIAMOND adds three additional variables, AS (bit score), ZR (raw alignment score), and ZE (E-value), to provide statistical evidence for protein alignment. This format is useful if one is looking for positional information and statistical significance. | ||
|
|
||
| <details markdown="1"> | ||
| <summary>Example InterProScan GFF output</summary> | ||
|
|
||
| ``` | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ### MultiQC | ||
|
|
||
| <details markdown="1"> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,24 +5,30 @@ | |
| "https://github.com/nf-core/modules.git": { | ||
| "modules": { | ||
| "nf-core": { | ||
| "mmseqs/search": { | ||
| "diamond/blastp": { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm this is a bit concerning because ALL OF |
||
| "branch": "master", | ||
| "git_sha": "81880787133db07d9b4c1febd152c090eb8325dc", | ||
| "installed_by": ["modules"] | ||
| "git_sha": "05954dab2ff481bcb999f24455da29a5828af08d", | ||
| "installed_by": [ | ||
| "modules" | ||
| ] | ||
| }, | ||
| "mtmalign/align": { | ||
| "diamond/makedb": { | ||
| "branch": "master", | ||
| "git_sha": "c7cfb9446fb3098e525089198ff232d795c20ef2", | ||
| "installed_by": ["modules"] | ||
| "git_sha": "05954dab2ff481bcb999f24455da29a5828af08d", | ||
| "installed_by": [ | ||
| "modules" | ||
| ] | ||
| }, | ||
| "multiqc": { | ||
| "branch": "master", | ||
| "git_sha": "f0719ae309075ae4a291533883847c3f7c441dad", | ||
| "installed_by": ["modules"] | ||
| "installed_by": [ | ||
| "modules" | ||
| ] | ||
| }, | ||
| "seqkit/stats": { | ||
| "branch": "master", | ||
| "git_sha": "81880787133db07d9b4c1febd152c090eb8325dc", | ||
| "git_sha": "81880787133db07d9b4c1febd152c090eb8325dc | ||
tracelail marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "installed_by": ["modules"] | ||
| }, | ||
| "untar": { | ||
|
|
@@ -37,20 +43,26 @@ | |
| "utils_nextflow_pipeline": { | ||
| "branch": "master", | ||
| "git_sha": "c2b22d85f30a706a3073387f30380704fcae013b", | ||
| "installed_by": ["subworkflows"] | ||
| "installed_by": [ | ||
| "subworkflows" | ||
| ] | ||
| }, | ||
| "utils_nfcore_pipeline": { | ||
| "branch": "master", | ||
| "git_sha": "51ae5406a030d4da1e49e4dab49756844fdd6c7a", | ||
| "installed_by": ["subworkflows"] | ||
| "installed_by": [ | ||
| "subworkflows" | ||
| ] | ||
| }, | ||
| "utils_nfschema_plugin": { | ||
| "branch": "master", | ||
| "git_sha": "2fd2cd6d0e7b273747f32e465fdc6bcc3ae0814e", | ||
| "installed_by": ["subworkflows"] | ||
| "installed_by": [ | ||
| "subworkflows" | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json | ||
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| dependencies: | ||
| # TODO nf-core: List required Conda package(s). | ||
| # Software MUST be pinned to channel (i.e. "bioconda"), version (i.e. "1.10"). | ||
| # For Conda, the build (i.e. "h9402c20_2") must be EXCLUDED to support installation on different operating systems. | ||
| - "YOUR-TOOL-HERE" | ||
tracelail marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| process DIAMONDPREPARETAXA { | ||
|
|
||
| // tag "${taxondmp_zip.baseName}" | ||
| label 'process_low' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/YOUR-TOOL-HERE': | ||
olgabot marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
tracelail marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 'biocontainers/YOUR-TOOL-HERE' }" | ||
|
|
||
| // write the output files to a user specified directory via an input parameter | ||
| // publishDir "${params.outdir}/ncbi_refseq/", mode: 'copy' | ||
|
|
||
| input: | ||
| val taxondmp_zip // Add default of ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz | ||
|
|
||
| output: | ||
| path("taxa/nodes.dmp"), emit: taxonnodes | ||
| path("taxa/names.dmp"), emit: taxonnames | ||
| path "versions.yml" , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| // def prefix = task.ext.prefix ?: "${meta.id}" | ||
| // Omitting from script portion for now | ||
| // # $args \\ | ||
| // # -@ $task.cpus \\ | ||
| // # -o ${prefix}.bam \\ | ||
|
|
||
| """ | ||
| mkdir -p taxa/ | ||
| wget -q ${taxondmp_zip} | ||
| tar -xzf taxdump.tar.gz -C taxa | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| diamondpreparetaxa: \$(diamondpreparetaxa --version) | ||
| END_VERSIONS | ||
| """ | ||
|
|
||
| stub: | ||
| // def args = task.ext.args ?: '' | ||
| // def prefix = task.ext.prefix ?: "${meta.id}" | ||
| """ | ||
|
|
||
tracelail marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| touch taxa/nodes.dmp | ||
| touch taxa/names.dmp | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| diamondpreparetaxa: \$(diamondpreparetaxa --version) | ||
| END_VERSIONS | ||
| """ | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| --- | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json | ||
| name: "diamondpreparetaxa" | ||
| ## TODO nf-core: Add a description of the module and list keywords | ||
| description: write your description here | ||
| keywords: | ||
| - sort | ||
| - example | ||
| - genomics | ||
| tools: | ||
| - "diamondpreparetaxa": | ||
| ## TODO nf-core: Add a description and other details for the software below | ||
| description: "" | ||
| homepage: "" | ||
| documentation: "" | ||
| tool_dev_url: "" | ||
| doi: "" | ||
| licence: | ||
| identifier: | ||
|
|
||
| ## TODO nf-core: Add a description of all of the variables used as input | ||
| input: | ||
| # Only when we have meta | ||
| - - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. `[ id:'sample1' ]` | ||
|
|
||
| ## TODO nf-core: Delete / customise this example input | ||
| - bam: | ||
| type: file | ||
| description: Sorted BAM/CRAM/SAM file | ||
| pattern: "*.{bam,cram,sam}" | ||
| ontologies: | ||
| - edam: "http://edamontology.org/format_2572" # BAM | ||
| - edam: "http://edamontology.org/format_2573" # CRAM | ||
| - edam: "http://edamontology.org/format_3462" # SAM | ||
|
|
||
| ## TODO nf-core: Add a description of all of the variables used as output | ||
| output: | ||
| - bam: | ||
| #Only when we have meta | ||
| - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| e.g. `[ id:'sample1' ]` | ||
| ## TODO nf-core: Delete / customise this example output | ||
| - "*.bam": | ||
| type: file | ||
| description: Sorted BAM/CRAM/SAM file | ||
| pattern: "*.{bam,cram,sam}" | ||
| ontologies: | ||
| - edam: "http://edamontology.org/format_2572" # BAM | ||
| - edam: "http://edamontology.org/format_2573" # CRAM | ||
| - edam: "http://edamontology.org/format_3462" # SAM | ||
|
|
||
| - versions: | ||
| - "versions.yml": | ||
| type: file | ||
| description: File containing software versions | ||
| pattern: "versions.yml" | ||
|
|
||
| authors: | ||
| - "@tracelail" | ||
| maintainers: | ||
| - "@tracelail" |
Uh oh!
There was an error while loading. Please reload this page.