Skip to content

egaffo/ccp2_nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

title subtitle output
CCP2_NF
A Nextflow wrapper for CirComPara2
html_document
toc number_sections
true
false

Quick usage

Copy the config/nextflow.config file into your project directory.

Modify the nextflow.config file to add custom volumes to the Docker run line:

runOptions = '-u $(id -u):$(id -g) -v /blackhole:/blackhole -v /sharedfs01:/sharedfs01 -v /sharedfs00:/sharedfs00'

Run with Nextflow:

nextflow run /path/to/ccp2_nf/scripts/main.nf --metafile=meta.csv --varsfile=/path/to/ccp2_project_dir/vars.py

N.B.: the meta.csv file must be declared in the vars.py as follows:

META = 'meta.csv'

Run the analysis.

Use the combine_ccp2_runs() function from the ccp2tools package (https://github.com/egaffo/ccp2tools) to merge the samples' output.

Extended "How to use"

1. Prepare your project directory structure

## N.B.: better to use an absolute path for the $PRJ_DIR env variable
PRJ_DIR=/sharedfs02/user/projectName

TOOLS_DIR=$PRJ_DIR/tools
CCP2_DIR=$PRJ_DIR/ccp2

mkdir $PRJ_DIR
mkdir $TOOLS_DIR
mkdir $CCP2_DIR
mkdir $PRJ_DIR/R_$PRJ_DIR

2. Install the required software

2.1 Install Nextflow

Here, we make a local copy of the Nextflow installation with a basic procedure.

However, you may want to use a Nextflow instance already installed on your system. For more detailed instructions on how to install Nextflow, please visit https://www.nextflow.io/docs/latest/install.html.

cd $TOOLS_DIR

mkdir nf
cd nf
curl -s https://get.nextflow.io | bash
chmod +x nextflow

cd $PRJ_DIR

2.2 CCP2_NF

Clone the ccp2_nf git repository:

cd $TOOLS_DIR
git clone /sharedfs01/enrico/ccp2_nf

Now all the Nextflow scripts are in the $TOOLS_DIR/ccp2_nf directory.

3. Set configuration files

3.1 Nextflow configuration

You need to set specific parameters in the Nextflow configuration file to match your environment.

First, get the nextflow.config template into your working directory:

cp $TOOLS_DIR/ccp2_nf/config/nextflow.config $CCP2_DIR/nextflow.config

Now, set custom parameters according to your running environment. For instance, to include the source data and the project directories as Docker volumes, you will change the runOptions line in the docker section.

## nextflow.config 
...
runOptions = '-u $(id -u):$(id -g) -v /sourcedatadir:/sourcedatadir -v /prjdir:/prjdir'

Moreover, you can tune the number of CPUs each CirComPara2 instance can use by modifying the cpus parameter in the runCCP2 process section:

cpus = 8

Mind that the cpus parameter must be twice the number of CPUs declared in the vars.py file.

For instance, if CPUS=8 in the vars.pythen you have to set cpus = 16 in the nexflow.config. This is required because the CirComPara2 Docker container runs two tasks in parallel; each task can use up to the CPUS number declared in the vars.py.

Another essential variable is the process.executor. By default, it is SLURM; change according to your computing environment and scheduler.

3.2 CCP2 configuration

Make the vars.py file according to your settings. Please, refer to the CirComPara2's manual to know how to set the vars.py properly. Just remember to set the CPUS parameter according to the nextflow.config file.

The content of a typical vars.py will look like:

META = 'meta.csv'
CPUS = '8'

GENOME_FASTA    = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.dna.primary_assembly.fa'
ANNOTATION      = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.108.gtf'
GENEPRED        = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.108.genePred.wgn'

PREPROCESSOR    = 'trimmomatic'
PREPROCESSOR_PARAMS = 'MAXINFO:40:0.5 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:30 MINLEN:50 AVGQUAL:30'

GENOME_INDEX    = '/sourcedatadir/indexes/hisat2/Homo_sapiens.GRCh38.dna.primary_assembly'
SEGEMEHL_INDEX  = '/sourcedatadir/indexes/segemehl/Homo_sapiens.GRCh38.dna.primary_assembly.idx'
BWA_INDEX       = '/sourcedatadir/indexes/bwa/Homo_sapiens.GRCh38.dna.primary_assembly'
BOWTIE_INDEX    = '/sourcedatadir/indexes/bowtie/Homo_sapiens.GRCh38.dna.primary_assembly'
BOWTIE2_INDEX   = '/sourcedatadir/indexes/bowtie2/Homo_sapiens.GRCh38.dna.primary_assembly'
STAR_INDEX      = '/sourcedatadir/indexes/star/Homo_sapiens.GRCh38.dna.primary_assembly'

HISAT2_EXTRA_PARAMS = '--rna-strandness RF'

Make the meta.csv file; it will look like:

file,sample,adapter
/sourcedatadir/dataset/S7_R1.fq.gz,S07,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S7_R2.fq.gz,S07,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S8_R1.fq.gz,S08,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S8_R2.fq.gz,S08,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa

N.B.: The adapter column is optional and will be used by Trimmomatic during the preprocessing step of the CirComPara2 pipeline. You can use the adapter file path as in the example above, it will refer to the file embedded in the Docker container.

Copy the command script and customise it:

cp $TOOLS_DIR/ccp2_nf/scripts/run_ccp2nf.sh $PRJ_DIR

Open your copy of the run_ccp2nf.sh file with a text editor like Nano or VIM and change the NFX_HOME and CCP2NF_HOME variables to your paths.

Make the script executable:

chmod +x run_ccp2nf.sh

4. Run the analysis

./run_ccp2nf.sh

5. Collect and analyse the results

Use the combine_ccp2_runs() function from the ccp2tools package (https://github.com/egaffo/ccp2tools) to merge the samples' output.

About

Run CirComPara2 through Nextflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published