This work focuses on the study of Becnel et al. (2016) who share cancer genomic data. In our work, we are interesting into mutations that could induce pancreatic cancer. Here, we work on data from patient ID TCRBOA7. In order to reduce time computation, we only used data on chromosome 16.
The pipeline runs on bash. Some package are required for launching some commands such as fastqc, trimmomatic, bwa and varscan.
sudo apt install fastqc # For using fastqc
conda install -c bioconda trimmomatic # For using trimmomatic
sudo apt install bwa # For using the bwa library
sudo apt install varscan # For using varscan somaticA machine with at least 8 GB of FREE RAM (to create the index and the mapping on the chromosome 16 of the reference genome).
The pipeline is used for detecting variants on whole exome sequencing (WES) data using paired-end files: the first two for tumor tissues and the other two for adjacent normal tissues. Here, it is used on a patient suffering from pancreatic cancer. Two files are generated at the end of the exome sequencing pipeline: a file with the single nucleotide polymorphisms (SNP) and a file with the insertions/deletions (indels). These files are available in the repository Data/Variants. The steps for generating these files are the followings.
- Clone the Github repository to your machine
git clone https://github.com/Theo-Roncalli/WES-pancreatic-cancer.git
cd WES-pancreatic-cancer- Importation of reads and reference genome
bash install.sh- Creation of the variant files which contains the SNP and indels.
bash variants.shFor cleaning the repository (i.e. delete Data and Figures folders), please type:
bash clean.sh