namlab-mapper

Little workflow which can download and map multiple RNA sequencing files from the NCBI SRA as well as any local FASTQ files to a common reference using kallisto. Because it is written in Nextflow, it can automatically parallelize steps across CPUs or nodes, if you are running it on a cluster (see this page for more details). It is also built to be economical with disk space by removing large intermediary files when they are no longer needed. The output is a combined table containing abundance quantifications as well as FastQC reports for each of sequence files.

Prerequisites

rnaseq-mapper will try to load the following modules: sratoolkit, kallisto, R, fastqc. If your system doesn't use modules, make sure the execs are available in your PATH. If you have not yet used the sratoolkit before, you will also need to configure it. To do this, run vdb-config --interactive. I recommend turning off enable local file caching in the Cache settings because this may keep downloaded sequence files on your disk even after the rnaseq-mapper deletes them which may cause your disk to run out of space if you process a lot of sequences.

Usage

Set up nextflow (if not installed already):

curl -s https://get.nextflow.io | bash

Create a file called nextflow.config (exactly this name) by using the example_nextflow.config from this directory as a template and adapting it to your use case.
Create an input file with the sequences you want to map in the format of example_input.csv and make sure it is referred to in your config file.
If desired, place any FASTQ files in the directories referenced in your nextflow.config (if you don't have any, make sure the folders still exist and just leave them empty).
Run the pipeline:

./nextflow run NAMlab/rnaseq-mapper

Singularity Container

If you prefer, you can also make use of the Singularity container that packages all the required software (sratoolkit, kallisto, R, fastqc). This requires Singularity or Apptainer to be installed in your system. You can then simply execute the pipeline (step 5 above, the other steps stay the same) via

./nextflow run NAMlab/rnaseq-mapper -with-singularity library://merlin/default/rnaseq-mapper:latest

or

./nextflow run NAMlab/rnaseq-mapper -with-apptainer library://merlin/default/rnaseq-mapper:latest

respectively.

Output

You will get out a TSV file with the combined kallisto outputs for all your sequence files like this one (by default in the work/out folder):

target_id	length	SRR1805735_eff_length	SRR1805737_eff_length	SRR6512869_eff_length	SRR6512869_est_counts	SRR6512869_tpm
Solyc00g005280.1.1	411	252.224	241.253	212	0	0
Solyc00g005285.2.1	216	68.6464	63.7937	31.5146	0	0
Solyc00g006483.2.1	390	231.296	220.691	191	0	0
Solyc00g006487.2.1	276	120.525	114.108	77.4659	2	22.2662
Solyc00g006560.2.1	1317	1158	1145.76	1118	0	0
Solyc00g006890.2.1	300	143.123	135.795	101.044	0	0
Solyc00g006900.2.1	576	416.999	404.931	377	0	0
Solyc00g007225.2.1	1275	1116	1103.76	1076	0	0
Solyc00g007330.1.1	516	356.999	345.082	317	0	0

You will also get FastQC reports for each of sequence files in the same folder.

Misc

rnaseq-mapper will retry downloading and mapping sequences from NCBI up to 4 times, if that still fails (e.g. because the sequence entry is not available or something is wrong with it), it will skip that sequence and your output will not contain any abundance information for it. So I recommend checking which sequences were actually mapped when working with the combined abundance file instead of assuming it will contain all accessions from your input.
if you map a lot of sequences (> 2000), the combineAll step which merges all the produced abundance files into one combined_abundance.tsv can get very slow. You can make sure the data.table package is available in the R environment so rnaseq-mapper can use the faster fread and Reduce functions.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
example_input.csv		example_input.csv
example_nextflow.config		example_nextflow.config
main.nf		main.nf
singularity_container.def		singularity_container.def

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

namlab-mapper

Prerequisites

Usage

Singularity Container

Output

Misc

About

Uh oh!

Releases

Packages

Languages

NAMlab/rnaseq-mapper

Folders and files

Latest commit

History

Repository files navigation

namlab-mapper

Prerequisites

Usage

Singularity Container

Output

Misc

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages