GitHub - egaffo/ccp2_nf: Run CirComPara2 through Nextflow

title

subtitle

output

CCP2_NF

A Nextflow wrapper for CirComPara2

html_document

toc	number_sections
true	false

Quick usage

Copy the config/nextflow.config file into your project directory.

Modify the nextflow.config file to add custom volumes to the Docker run line:

runOptions = '-u $(id -u):$(id -g) -v /blackhole:/blackhole -v /sharedfs01:/sharedfs01 -v /sharedfs00:/sharedfs00'

Run with Nextflow:

nextflow run /path/to/ccp2_nf/scripts/main.nf --metafile=meta.csv --varsfile=/path/to/ccp2_project_dir/vars.py

N.B.: the meta.csv file must be declared in the vars.py as follows:

META = 'meta.csv'

Run the analysis.

Use the combine_ccp2_runs() function from the ccp2tools package (https://github.com/egaffo/ccp2tools) to merge the samples' output.

Extended "How to use"

1. Prepare your project directory structure

## N.B.: better to use an absolute path for the $PRJ_DIR env variable
PRJ_DIR=/sharedfs02/user/projectName

TOOLS_DIR=$PRJ_DIR/tools
CCP2_DIR=$PRJ_DIR/ccp2

mkdir $PRJ_DIR
mkdir $TOOLS_DIR
mkdir $CCP2_DIR
mkdir $PRJ_DIR/R_$PRJ_DIR

2. Install the required software

2.1 Install Nextflow

Here, we make a local copy of the Nextflow installation with a basic procedure.

However, you may want to use a Nextflow instance already installed on your system. For more detailed instructions on how to install Nextflow, please visit https://www.nextflow.io/docs/latest/install.html.

cd $TOOLS_DIR

mkdir nf
cd nf
curl -s https://get.nextflow.io | bash
chmod +x nextflow

cd $PRJ_DIR

2.2 CCP2_NF

Clone the ccp2_nf git repository:

cd $TOOLS_DIR
git clone /sharedfs01/enrico/ccp2_nf

Now all the Nextflow scripts are in the $TOOLS_DIR/ccp2_nf directory.

3. Set configuration files

3.1 Nextflow configuration

You need to set specific parameters in the Nextflow configuration file to match your environment.

First, get the nextflow.config template into your working directory:

cp $TOOLS_DIR/ccp2_nf/config/nextflow.config $CCP2_DIR/nextflow.config

Now, set custom parameters according to your running environment. For instance, to include the source data and the project directories as Docker volumes, you will change the runOptions line in the docker section.

## nextflow.config 
...
runOptions = '-u $(id -u):$(id -g) -v /sourcedatadir:/sourcedatadir -v /prjdir:/prjdir'

Moreover, you can tune the number of CPUs each CirComPara2 instance can use by modifying the cpus parameter in the runCCP2 process section:

cpus = 8

Mind that the cpus parameter must be twice the number of CPUs declared in the vars.py file.

For instance, if CPUS=8 in the vars.pythen you have to set cpus = 16 in the nexflow.config. This is required because the CirComPara2 Docker container runs two tasks in parallel; each task can use up to the CPUS number declared in the vars.py.

Another essential variable is the process.executor. By default, it is SLURM; change according to your computing environment and scheduler.

3.2 CCP2 configuration

Make the vars.py file according to your settings. Please, refer to the CirComPara2's manual to know how to set the vars.py properly. Just remember to set the CPUS parameter according to the nextflow.config file.

The content of a typical vars.py will look like:

META = 'meta.csv'
CPUS = '8'

GENOME_FASTA    = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.dna.primary_assembly.fa'
ANNOTATION      = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.108.gtf'
GENEPRED        = '/sourcedatadir/annotation/Homo_sapiens.GRCh38.108.genePred.wgn'

PREPROCESSOR    = 'trimmomatic'
PREPROCESSOR_PARAMS = 'MAXINFO:40:0.5 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:30 MINLEN:50 AVGQUAL:30'

GENOME_INDEX    = '/sourcedatadir/indexes/hisat2/Homo_sapiens.GRCh38.dna.primary_assembly'
SEGEMEHL_INDEX  = '/sourcedatadir/indexes/segemehl/Homo_sapiens.GRCh38.dna.primary_assembly.idx'
BWA_INDEX       = '/sourcedatadir/indexes/bwa/Homo_sapiens.GRCh38.dna.primary_assembly'
BOWTIE_INDEX    = '/sourcedatadir/indexes/bowtie/Homo_sapiens.GRCh38.dna.primary_assembly'
BOWTIE2_INDEX   = '/sourcedatadir/indexes/bowtie2/Homo_sapiens.GRCh38.dna.primary_assembly'
STAR_INDEX      = '/sourcedatadir/indexes/star/Homo_sapiens.GRCh38.dna.primary_assembly'

HISAT2_EXTRA_PARAMS = '--rna-strandness RF'

Make the meta.csv file; it will look like:

file,sample,adapter
/sourcedatadir/dataset/S7_R1.fq.gz,S07,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S7_R2.fq.gz,S07,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S8_R1.fq.gz,S08,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa
/sourcedatadir/dataset/S8_R2.fq.gz,S08,/circompara2/tools/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa

N.B.: The adapter column is optional and will be used by Trimmomatic during the preprocessing step of the CirComPara2 pipeline. You can use the adapter file path as in the example above, it will refer to the file embedded in the Docker container.

Copy the command script and customise it:

cp $TOOLS_DIR/ccp2_nf/scripts/run_ccp2nf.sh $PRJ_DIR

Open your copy of the run_ccp2nf.sh file with a text editor like Nano or VIM and change the NFX_HOME and CCP2NF_HOME variables to your paths.

Make the script executable:

chmod +x run_ccp2nf.sh

4. Run the analysis

./run_ccp2nf.sh

5. Collect and analyse the results

Use the combine_ccp2_runs() function from the ccp2tools package (https://github.com/egaffo/ccp2tools) to merge the samples' output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick usage

Extended "How to use"

1. Prepare your project directory structure

2. Install the required software

2.1 Install Nextflow

2.2 CCP2_NF

3. Set configuration files

3.1 Nextflow configuration

3.2 CCP2 configuration

4. Run the analysis

5. Collect and analyse the results

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
scripts		scripts
README.md		README.md

egaffo/ccp2_nf

Folders and files

Latest commit

History

Repository files navigation

Quick usage

Extended "How to use"

1. Prepare your project directory structure

2. Install the required software

2.1 Install Nextflow

2.2 CCP2_NF

3. Set configuration files

3.1 Nextflow configuration

3.2 CCP2 configuration

4. Run the analysis

5. Collect and analyse the results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages